Why is sample Not Randomly Populating my Vector? - c++

I was experimenting with a toy sample program:
map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo));
sample(begin(foo), end(foo), begin(bar), size(foo), mt19937{ random_device{}() });
Live Example
But bar always contains the contents of foo in order. Is this a gcc implementation problem, or am I just repeatedly getting unlucky?

std::sample selects elements from the range you pass. From cppreference (emphasize mine):
Selects n elements from the sequence [first; last) such that each
possible sample has equal probability of appearance, and writes those
selected elements into the output iterator out. Random numbers are
generated using the random number generator g.
If n is greater than the number of elements in the sequence, selects last-first elements.
I think the docs could be more clear, but returning only last-first in case the number of requested elements is greater, only makes sense if each element is selected at maximum once.
Try:
map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo)-1);
sample(begin(foo), end(foo), begin(bar), bar.size(), mt19937{ random_device{}() });
to get two random samples out of foo.
Also note that
The algorithm is stable only if PopulationIterator meets the
requirements of ForwardIterator
ie it was not just out of luck that you always got the same result.

Sampling is about returning some subset of a larger population.
It is not intended to return elements in a random order, or any other order. It could, but that's not really what it's there for.
cppreference hints at ordering in this statement:
The algorithm is stable only if PopulationIterator meets the requirements of ForwardIterator
"Stable" here means it would return results in the same order as the input, thus the order is guaranteed to not be random with a ForwardIterator. Related: What is stability in sorting algorithms and why is it important?
This also makes sense, since, similar to what's been noted in the comments, to be efficient, you'll first need to figure out which elements to pick, and then go through the iterator and pick the elements, since you can only iterate from one direction to the other. Thus it should be trivial to keep the elements in the same order.
As for when you're not using a ForwardIterator, it makes no guarantee about the order one way or the other. So, even if it might appear to be randomly ordered, it would not be wise to rely on this, as the randomness of the ordering may be implementation-dependent and it may or may not have high entropy.
If you want a random order, you should shuffle it.

Related

CodeSignal: Execution time limit exceeded with c++

I am trying to solve the programming problem firstDuplicate on codesignal. The problem is "Given an array a that contains only numbers in the range 1 to a.length, find the first duplicate number for which the second occurrence has minimal index".
Example: For a = [2, 1, 3, 5, 3, 2] the output should be firstDuplicate(a) = 3
There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.
With this code I pass 21/23 tests, but then it tells me that the program exceeded the execution time limit on test 22. How would I go about making it faster so that it passes the remaining two tests?
#include <algorithm>
int firstDuplicate(vector<int> a) {
vector<int> seen;
for (size_t i = 0; i < a.size(); ++i){
if (std::find(seen.begin(), seen.end(), a[i]) != seen.end()){
return a[i];
}else{
seen.push_back(a[i]);
}
}
if (seen == a){
return -1;
}
}
Anytime you get asked a question about "find the duplicate", "find the missing element", or "find the thing that should be there", your first instinct should be use a hash table. In C++, there are the unordered_map and unordered_set classes that are for such types of coding exercises. The unordered_set is effectively a map of keys to bools.
Also, pass you vector by reference, not value. Passing by value incurs the overhead of copying the entire vector.
Also, that comparison seems costly and unnecessary at the end.
This is probably closer to what you want:
#include <unordered_set>
int firstDuplicate(const vector<int>& a) {
std::unordered_set<int> seen;
for (int i : a) {
auto result_pair = seen.insert(i);
bool duplicate = (result_pair.second == false);
if (duplicate) {
return (i);
}
}
return -1;
}
std::find is linear time complexity in terms of distance between first and last element (or until the number is found) in the container, thus having a worst-case complexity of O(N), so your algorithm would be O(N^2).
Instead of storing your numbers in a vector and searching for it every time, Yyu should do something like hashing with std::map to store the numbers encountered and return a number if while iterating, it is already present in the map.
std::map<int, int> hash;
for(const auto &i: a) {
if(hash[i])
return i;
else
hash[i] = 1;
}
Edit: std::unordered_map is even more efficient if the order of keys doesn't matter, since insertion time complexity is constant in average case as compared to logarithmic insertion complexity for std::map.
It's probably an unnecessary optimization, but I think I'd try to take slightly better advantage of the specification. A hash table is intended primarily for cases where you have a fairly sparse conversion from possible keys to actual keys--that is, only a small percentage of possible keys are ever used. For example, if your keys are strings of length up to 20 characters, the theoretical maximum number of keys is 25620. With that many possible keys, it's clear no practical program is going to store any more than a minuscule percentage, so a hash table makes sense.
In this case, however, we're told that the input is: "an array a that contains only numbers in the range 1 to a.length". So, even if half the numbers are duplicates, we're using 50% of the possible keys.
Under the circumstances, instead of a hash table, even though it's often maligned, I'd use an std::vector<bool>, and expect to get considerably better performance in the vast majority of cases.
int firstDuplicate(std::vector<int> const &input) {
std::vector<bool> seen(input.size()+1);
for (auto i : input) {
if (seen[i])
return i;
seen[i] = true;
}
return -1;
}
The advantage here is fairly simple: at least in a typical case, std::vector<bool> uses a specialization to store bools in only one bit apiece. This way we're storing only one bit for each number of input, which increases storage density, so we can expect excellent use of the cache. In particular, as long as the number of bytes in the cache is at least a little more than 1/8th the number of elements in the input array, we can expect all of seen to be in the cache most of the time.
Now make no mistake: if you look around, you'll find quite a few articles pointing out that vector<bool> has problems--and for some cases, that's entirely true. There are places and times that vector<bool> should be avoided. But none of its limitations applies to the way we're using it here--and it really does give an advantage in storage density that can be quite useful, especially for cases like this one.
We could also write some custom code to implement a bitmap that would give still faster code than vector<bool>. But using vector<bool> is easy, and writing our own replacement that's more efficient is quite a bit of extra work...

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

How to random retrieve an element from C++ hash table in O(1)

Is there a way to randomly retrieve an element from C++ unordered_set in O(1) average time? Instead of doing
std::unordered_set<int> s;
// initialize s
auto start = s.begin();
for (int i = 0; i < rand()%s.size()-1; ++i, ++start) {}
int randomNumber = *start;
Updated:
I need to fight for the post, so I add my reasons for needing the functionality of above.
I am playing with implementing a maze generator. And somehow I need a data structure which would support:
insertion / deletion in O(1)
random retrieve an element from the data structure in O(1)
std::vector has random access, but insertion / deletion is expensive
std::list has no random access
std::set support O(logN) random access, and O(logN) insertion/ deletion, which are great, but my initialization is a sorted sequence which would easily break the balance of it.
So I thought hash table would be the best choice, however randomly retrieval an element would be nontrivial.
Thank you for your time.
You can't pick a random element from an unordered_set in O(1) time. The iterators are ForwardIterators, not RandomAccessIterators. You would have to use a different container. Either boost::container::flat_set<int> or write your own that also has something like a vector internally:
template <typename T>
class set_with_random_access
{
std::vector<T*> vec;
std::unordered_set<T> set;
};
For which we provide functions that keep those in line, like insertion:
void insert(const T& value) {
auto pr = set.insert(value);
if (pr.second) {
vec.push_back(&*pr.first);
}
}
And random-ness:
template <typename GEN>
T& random(GEN& gen) {
std::uniform_int_distribution<size_t> dist(0, vec.size() - 1);
return *vec[dist(gen)];
}
Which is, frankly, a lot of work, so probably use the boost one.
a way to randomly retrieve an element from C++ unordered_set in O(1) average time?
Depends what counts as "random" for your purposes, and whether being a tiny smidge above O(1) is good enough. You can pick a random bucket "b" between 0 and s.bucket_count() - 1, repeating if the bucket's empty, then a list index li between 0 and s.bucket_size(b) - 1, then std::advance(s.begin(li)) to get an iterator to a "random" element, but, consider this situation:
You roll three dice - then randomly pick one of those: you get a random 1-6 value with even probability, but if you keep picking without rolling again you can only ever get whatever value(s) ended up on the three dice: the probabilities of each value from 1 through 6 are severely uneven.
The above approach to picking a random element in an unordered_set is a little like that: if there are x buckets with elements, then each bucket has an even chance of being selected, but the elements in that bucket have 1 / x / bucket_size() chance of selection which - for any given bucket - may be less or more than 1 / size(). In other words, if you consider the hashing effectively random, then the various elements have equal chance of being favoured or penalised in their placement, but that "skew" is then set it stone until the table data's significantly mutated or the table's rehashed (and if it's rehashed by say doubling table size rather than to a larger prime number (vague memory that unordered_set doubles), then once-penalised values will tend to remain penalised half the time).
The big-O efficiency of the above is a tiny smidge above O(1) because:
there's some repetition in the initial probe to find a bucket with elements, but with a load factor of 1.0 it's unlikely to need more than a couple attempts (given a good hash function); other options are available - like iterating from an empty bucket, or jumping by various displacements (modded into the table size) - which may perform a little better than trying another completely random bucket but may also exacerbate discrepancies in the odds of element selection
there's linear iteration in the elements colliding in any given bucket, but as the default load factor is 1.0 it'll be rare to have more than a couple collisions, and increasingly extremely rare to have many more than that.
Picking a random element from a std::unordered_set is a bad idea. This is due to the fact that std::unordered_set doesn't support random access and thus doesn't have a subscript operator (i.e., operator[]).
I strongly believe that what you need is a std::vector in combine with std::unique in order to satisfy element uniqueness.
In the example below I use a std::vector and then I ensure that it has only unique elements by applying std::unique algorithm on it. Then I use the random utilities in order to generate a random index in [0, vector's size - 1]:
std::vector<int> v{1, 2, 8, 3, 5, 4, 5, 6, 7, 7, 9, 9, 19, 19};
v.erase(std::unique(v.begin(), v.end()), v.end());
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0, v.size() - 1);
std::cout << "Random number from vector: " << v[distribution(generator)] << std::endl;
LIVE DEMO

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!
You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.
Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)
If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.
If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});

Map from integer ranges to arbitrary single integers

Working in C++ in a Linux environment, I have a situation where a number of integer ranges are defined, and integer inputs map to different arbitrary integers based on which range they fall into. None of the ranges overlap, and they aren't always contiguous.
The "simplest" way to solve this problem is with a bunch of if-statements for each range, but the number of ranges, their bounds, and the target values can all vary, so if-statements aren't maintainable.
For example, the ranges might be [0, 70], called r_a, [101, 150], call it r_b, and [201, 400], call it r_c. Inputs in r_a map to 1, in r_b map to 2, and r_c map to 3. Anything not in r_a, r_b, r_c maps to 0.
I can come up with a data structure & algorithm that stores tuples of (bounds, map target) and iterates through them, so finding the target value takes linear time in the number of bounds pairs. I can also imagine a scheme that keeps the pairs ordered and uses a binary sort-ish algorithm against all the lower bounds (or upper bounds), finds the closest to the input, then compares against the opposing bound.
Is there a better way to accomplish the mapping than a binary-search based algorithm? Even better, is there some C++ library out that does this already?
The best approach here is indeed a binary search, but any efficient order-based search will do perfectly well. You don't really have to implement the search and the data structure explicitly. You can use it indirectly by employing a standard associative container instead.
Since your ranges don't overlap, the solution is very simple. You can immediately use a std::map for this problem to solve it in just a few lines of code.
For example, this is one possible approach. Let's assume that we are mapping an [ int, int ] range to an int value. Let's represent our ranges as closed-open ranges, i.e. if the original range is [0, 70], let's consider a [0, 71) range instead. Also, let's use the value of 0 as a "reserved" value that means "no mapping" (as you requested in your question)
const int EMPTY = 0;
All you need to do is to declare a map from int to int:
typedef std::map<int, int> Map;
Map map;
and fill it with each end of your closed-open ranges. The left (closed) end should be mapped to the desired value the entire range is mapped to, while the right (open) end should be mapped to our EMPTY value. For your example, it will look as follows
map[0] = r_a;
map[71] = EMPTY;
map[101] = r_b;
map[251] = EMPTY;
map[260] = r_c; // 260 adjusted from 201
map[401] = EMPTY;
(I adjusted your last range, since in your original example it overlapped the previous range, and you said that your ranges don't overlap).
That's it for initialization.
Now, in order to determine where a given value of i maps to all you need to do is
Map::iterator it = map.upper_bound(i);
If it == map.begin(), then i is not in any range. Otherwise, do
--it;
If the it->second (for the decremented it) is EMPTY, then i is not in any range.
The combined "miss" check might look as follows
Map::iterator it = map.upper_bound(i);
if (it == map.begin() || (--it)->second == EMPTY)
/* Missed all ranges */;
Otherwise, it->second (for the decremented it) is your mapped value
int mapped_to = it->second;
Note that if the original ranges were "touching", as in [40, 60] and [61, 100], then the closed-open ranges will look as [40, 61) and [61, 101) meaning that the value of 61 will be mapped twice during map initialization. In this case it is important to make sure that the value of 61 is mapped to the proper destination value and not to the value of EMPTY. If you map the ranges as shown above in the left-to-right (i.e. increasing) order it will work correctly by itself.
Note, that only the endpoints of the ranges are inserted into the map, meaning that the memory consumption and the performance of the search depends only on the total number of ranges and completely independent of their total length.
If you wish, you can add a "guard" element to the map during the initialization
map[INT_MIN] = EMPTY;
(it corresponds to "negative infinity") and the "miss" check will become simpler
Map::iterator it = map.upper_bound(i);
assert(it != map.begin());
if ((--it)->second == EMPTY)
/* Missed all ranges */;
but that's just a matter of personal preference.
Of course, if you just want to return 0 for non-mapped values, you don't need to carry out any checking at all. Just take the it->second from the decremented iterator and you are done.
I would use a very simple thing: a std::map.
class Range
{
public:
explicit Range(int item); // [item,item]
Range(int low, int high); // [low,high]
bool operator<(const Range& rhs) const
{
if (mLow < rhs.mLow)
{
assert(mHigh < rhs.mLow); // sanity check
return true;
}
return false;
} // operator<
int low() const { return mLow; }
int high() const { return mHigh; }
private:
int mLow;
int mHigh;
}; // class Range
Then, let's have a map:
typedef std::map<Range, int> ranges_type;
And write a function that search in this map:
int find(int item, const ranges_type& ranges)
{
ranges_type::const_iterator it = ranges.lower_bound(Range(item));
if (it != ranges.end() && it->first.low() <= item)
return it->second;
else
return 0; // No mapping ?
}
Main benefits:
Will check that the ranges effectively don't overlap during insertion in the set (you can make it so that it's only in debug mode)
Supports edition of the Ranges on the fly
Finding is fast (binary search)
If the ranges are frozen (even if their values are not), you may wish to use Loki::AssocVector to reduce the memory overhead and improve performance a bit (basically, it's a sorted vector with the interface of a map).
Wouldn't a simple array be enough? You're not saying how many items you have, but by far the fastest data structure is a simple array.
If the ranges are:
0..9 --> 25
10..19 --> 42
Then the array would simply be like this:
[25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
You can have two sorted arrays: one for lower bounds, one for upper bounds. Use std::lower_bound(lower_bound_array, value) and std::upper_bound(upper_bound_array, value). If the index of both results is the same, return index + 1. Otherwise, return 0.
If the indices returned match, it means that the value is >= the lower bound and < the upper bound. If they don't, then you are in between ranges.
The ideal is an interval tree (specialized binary tree). Wikipedia describes the method completely. Better than I. You won't get much more optimal than this, without sacrificing space for performance.
Your example ranges overlap, but the question says they wont. I'll assume the range is a typo. You could, could, store the destinations in an array and use the indices as the ranges. It's pretty easy, but ugly and not very maintainable. You'd need to initialize the array to 0, then for each range, iterate over those indices and set each index to the destination value. Very ugly, but constant lookup time so maybe useful if the numbers don't get too high and the ranges don't change very often.
Record the limits into a set (or map). When you call insert you will have a return value which is a pair. An iterator and a boolean. If the boolean is true then a new element is created which you have to remove later. After that step one with the iterator and look at what you have found.
http://www.cplusplus.com/reference/stl/set/insert/ See Return value
It's 1-dimensional spatial index. Quadtree-style binary tree will do, for example - and there are several other widely used methods.
A simple Linked List containing the range entries should be quick enough, even for say 50-100 ranges. Moreover, you could implement a Skip List, on say the upper bounds, to speed up these range queries. Yet another possibility is an Interval Tree.
Ultimately I'd choose the simplest: binary search.
You may find Minimal Perfect Hashing Function useful, http://cmph.sourceforge.net/.