STL container to use for deleting a range of values - c++

I have with myself n integers. I am given with starting value(SV) and ending value(EV), I need to delete the values that lie within the range of these values. The starting value as well the ending value may not exist in the set of n integers. This I need to do in O(No. Of elements deleted). Container such as vector of integers is failing since I need to B.Search and get the iterator that is greater than equal to the SV and for EV as well which takes an additional time of Log n. Any approach appreciated.
Edit : I was even thinking of using maps, storing the values as keys, and erasing on the basis of those key values. But the problem again is that the operation lower_bound and upper_bound occur in log time.

If you need keep order in container just use:
set http://www.cplusplus.com/reference/set/set/
or multiset http://www.cplusplus.com/reference/set/multiset/ if values can repeat.
Both have lower_bound and upper_bound functionality.

You can use the erase-remove idiom with std::remove_if to check if each value is between the two bounds.
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9};
int start = 3;
int stop = 7;
values.erase(std::remove_if(values.begin(),
values.end(),
[start, stop](int i){ return i > start && i < stop; }),
values.end());
for (auto i : values)
{
std::cout << i << " ";
}
}
Output
1 2 3 7 8 9

As Marek R suggested there is std::multiset.
Complexity of the whole exercise is O(log(values.size()+std::distance(first,last)) where that distance is 'number of elements erased'.
It's difficult to see how you can beat that. At the end of it there is always going to be something that increases with the size of the container and log of it is a good deal!
#include <iostream>
#include <set>
void dump(const std::multiset<int>& values);
int main() {
std::multiset<int> values;
values.insert(5);
values.insert(7);
values.insert(9);
values.insert(11);
values.insert(8);
values.insert(8);
values.insert(76);
dump(values);
auto first=values.lower_bound(7);
auto last=values.upper_bound(10);
values.erase(first,last);
dump(values);
return 0;
}
void dump(const std::multiset<int>& values){
auto flag=false;
std::cout<<'{';
for(auto curr : values){
if(flag){
std::cout<<',';
}else{
flag=true;
}
std::cout<< curr;
}
std::cout<<'}'<<std::endl;
}

Related

An efficient algorithm to sample non-duplicate random elements from an array

I'm looking for an algorithm to pick M random elements from a given array. The prerequisites are:
the sampled elements must be unique,
the array to sample from may contain duplicates,
the array to sample from is not necessarily sorted.
This is what I've managed to come up with. Here I'm also making an assumption that the amount of unique elements in the array is greater (or equal) than M.
#include <random>
#include <vector>
#include <algorithm>
#include <iostream>
const std::vector<int> sample(const std::vector<int>& input, size_t n) {
std::random_device rd;
std::mt19937 engine(rd());
std::uniform_int_distribution<int> dist(0, input.size() - 1);
std::vector<int> result;
result.reserve(n);
size_t id;
do {
id = dist(engine);
if (std::find(result.begin(), result.end(), input[id]) == result.end())
result.push_back(input[id]);
} while (result.size() < n);
return result;
}
int main() {
std::vector<int> input{0, 0, 1, 1, 2, 2, 3, 3, 4, 4};
std::vector<int> result = sample(input, 3);
for (const auto& item : result)
std::cout << item << ' ';
std::cout << std::endl;
}
This algorithm does not seem to be the best. Is there a more efficient (with less time complexity) algorithm to solve this task? It would be good if this algorithm could also assert the amount of unique elements in the input array is not less than M (or pick as many unique elements as possible if this is not the case).
Possible solution
As MSalters suggested, I use std::unordered_set to remove duplicates and std::shuffle to shuffle elements in a vector constructed from the set. Then I resize the vector and return it.
const std::vector<int> sample(const std::vector<int>& input, size_t M) {
std::unordered_set<int> rem_dups(input.begin(), input.end());
if (rem_dups.size() < M) M = rem_dups.size();
std::vector<int> result(rem_dups.begin(), rem_dups.end());
std::mt19937 g(std::random_device{}());
std::shuffle(result.begin(), result.end(), g);
result.resize(M);
return result;
}
The comments already note the use of std::set. The additional request to check for M unique elements in the input make that a bit more complicated. Here's an alternative implementation:
Put all inputs in a std::set or std::unordered_set. This removes duplicates.
Copy all elements to the return vector
If that has more than M elements, std::shuffle it and resize it to M elements.
Return it.
Use a set S to store the output, initially empty.
i = 0
while |S| < M && i <= n-1
swap the i'th element of the input with a random greater element
add the newly swapped i'th element to your set if it isn't already there
i++
This will end with S having M distinct elements from your input array (if there are M distinct elements). However, elements which are more common in the input array are more likely to be in S (unless you go through the additional work of eliminating duplicates from the input first).

Find equals value into an array in c++

There is a faster way to find equals value into an array instead of comparing all elements one by one with all the array's elements ?
for(int i = 0; i < arrayLenght; i ++)
{
for(int k = i; k < arrayLenght; i ++)
{
if(array[i] == array[k])
{
sprintf(message,"There is a duplicate of %s",array[i]);
ShowMessage(message);
break;
}
}
}
Since sorting your container is a possible solution, std::unique is the simplest solution to your problem:
std::vector<int> v {0,1,0,1,2,0,1,2,3};
std::sort(begin(v), end(v));
v.erase(std::unique(begin(v), end(v)), end(v));
First, the vector is sorted. You can use anything, std::sort is just the simplest. After that, std::unique shifts the duplicates to the end of the container and returns an iterator to the first duplicate. This is then eaten by erase and effectively removes those from the vector.
You could use std::multiset and then count duplicates afterwards like this:
#include <iostream>
#include <set>
int main()
{
const int arrayLenght = 14;
int array[arrayLenght] = { 0,2,1,3,1,4,5,5,5,2,2,3,5,5 };
std::multiset<int> ms(array, array + arrayLenght);
for (auto it = ms.begin(), end = ms.end(); it != end; it = ms.equal_range(*it).second)
{
int cnt = 0;
if ((cnt = ms.count(*it)) > 1)
std::cout << "There are " << cnt << " of " << *it << std::endl;
}
}
https://ideone.com/6ktW89
There are 2 of 1
There are 3 of 2
There are 2 of 3
There are 5 of 5
If your value_type of this array could be sorted by operator <(a strict weak order) it's a good choice to do as YSC answered.
If not,maybe you can try to define a hash function to hash the objects to different values.Then you can do this in O(n) time complexity,like:
struct ValueHash
{
size_t operator()(const Value& rhs) const{
//do_something
}
};
struct ValueCmp
{
bool operator()(const Value& lhs, const Value& rhs) const{
//do_something
}
};
unordered_set<Value,ValueHash,ValueCmp> myset;
for(int i = 0; i < arrayLenght; i ++)
{
if(myset.find(array[i])==myset.end())
myset.insert(array[i]);
else
dosomething();
}
In case you have a large amount of data, you can first sort the array (quick sort gives you a first pass in O(n*log(n))) and then do a second pass by comparing each value with the next (as they might be all together) to find duplicates (this is a sequential pass in O(n)) so, sorting in a first pass and searching the sorted array for duplicates gives you O(n*log(n) + n), or finally O(n*log(n)).
EDIT
An alternative has been suggested in the comments, of using a std::set to check for already processed data. The algorithm just goes element by element, checking if the element has been seen before. This can lead to a O(n) algorithm, but only if you take care of using a hash set. In case you use a sorted set, then you incur in an O(log(n)) for each set search and finish in the same O(n*log(n)). But because the proposal can be solved with a hash set (you have to be careful in selecting an std::unsorted_set, so you don't get the extra access time per search) you get a final O(n). Of course, you have to account for possible automatic hash table grow or a huge waste of memory used in the hash table.
Thanks to #freakish, who pointed the set solution in the comments to the question.

Which STL to use to find index by value in O(1) in C++

Say I have an array arr[] = {1 , 3 , 5, 12124, 24354, 12324, 5}
I want to know the index of the value 5(i.e, 2) in O(1).
How should I go about this?
P.S :
1. Throughout my program, I shall be finding only indices and not the vice versa (getting the value by index).
2. The array can have duplicates.
If you can guarantee there are no duplicates in the array, you're best bet is probably creating an unordered_map where the map key is the array value, and map value is its index.
I wrote a method below that converts an array to an unordered_map.
#include <unordered_map>
#include <iostream>
template <typename T>
void arrayToMap(const T arr[], size_t arrSize, std::unordered_map<T, int>& map)
{
for(int i = 0; i < arrSize; ++i) {
map[arr[i]] = i;
}
}
int main()
{
int arr[] = { 1 , 3 , 5, 12124, 24354, 12324, 5 };
std::unordered_map<int, int> map;
arrayToMap(arr, sizeof(arr)/sizeof(*arr), map);
std::cout << "Value" << '\t' << "Index" << std::endl;
for(auto it = map.begin(), e = map.end(); it != e; ++it) {
std::cout << it->first << "\t" << it->second << std::endl;
}
}
However, in your example you use the value 5 twice. This causes a strange output in the above code. The outputted map does not have a value with an index 2. Even if you use an array, you would be confronted with a similar problem (i.e. should you use the value at 2 or 6?).
If you really need both values, you could use unordered_multimap, but the syntax for accessing elements isn't easy as using the operator[] (you have to use unordered_multipmap::find() which returns an iterator).
template <typename T>
void arrayToMap(const T arr[], size_t arrSize, std::unordered_multimap<T, int>& map)
{
for(int i = 0; i < arrSize; ++i) {
map.emplace(arr[i], i);
}
}
Finally, you should consider that unordered_map's fast look-up time O(1) comes with some overhead, so it uses more memory than a simple array. But if you end up using an array (which is comparatively much more memory efficient), searching for a specific value is guaranteed to be O(n) where n is the index of the value.
Edit - If you need the duplicate with the lowest index to be kept instead of the highest, you can just reverse the order of insertion:
template <typename T>
void arrayToMap(const T arr[], size_t arrSize, std::unordered_map<T, int>& map)
{
for(int i = arraySize - 1; i >= 0; --i) {
map[arr[i]] = i;
}
}
Use std::unordered_map from C++11 to map elements as key and indices as value. Then you can get answer of your query in amortized O(1) complexity. std::unordered_map will work because there is no duplicacy as you said but cost you linear size extra space.
If your value's range is not too large, you can use an array as well. This will yield even better theta(1) complexity.
use unordered_multimap (C++11 only) with the value as the key, and the position index as the value.

Time-efficient way to count number of distinct numbers

get_number() returns an integer. I'm going to call it 30 times and count the number of distinct integers returned. My plan is to put these numbers into an std::array<int,30>, sort it and then use std::unique.
Is that a good solution? Is there a better one? This piece of code will be the bottleneck of my program.
I'm thinking there should be a hash-based solution, but maybe its overhead would be too much when I've only got 30 elements?
Edit I changed unique to distinct. Example:
{1,1,1,1} => 1
{1,2,3,4} => 4
{1,3,3,1} => 2
I would use std::set<int> as it's simpler:
std::set<int> s;
for(/*loop 30 times*/)
{
s.insert(get_number());
}
std::cout << s.size() << std::endl; // You get count of unique numbers
If you want to count return times of each unique number, I'd suggest map
std::map<int, int> s;
for(int i=0; i<30; i++)
{
s[get_number()]++;
}
cout << s.size() << std::endl; // total count of distinct numbers returned
for (auto it : s)
{
cout << it.first << " " << it.second<< std::endl; // each number and return counts
}
The simplest solution would be to use a std::map:
std::map<int, size_t> counters;
for (size_t i = 0; i != 30; ++i) {
counters[getNumber()] += 1;
}
std::vector<int> uniques;
for (auto const& pair: counters) {
if (pair.second == 1) { uniques.push_back(pair.first); }
}
// uniques now contains the items that only appeared once.
Using a std::map, std::set or the std::sort algorithm will give you a O(n*log(n)) complexity. For a small to large number of elements it is perfectly correct. But you use a known integer range and this opens the door to lot of optimizations.
As you say (in a comment) that the range of your integers is known and short: [0..99]. I would recommend to implement a modified counting sort. See: http://en.wikipedia.org/wiki/Counting_sort
You can count the number of distinct items while doing the sort itself, removing the need for the std::unique call. The whole complexity would be O(n). Another advantage is that the memory needed is independent of the number of input items. If you had 30.000.000.000 integers to sort, it would not need a single supplementary byte to count the distinct items.
Even is the range of allowed integer value is large, says [0..10.000.000] the memory consumed would be quite low. Indeed, an optimized version could consume as low as 1 bit per allowed integer value. That is less than 2 MB of memory or 1/1000th of a laptop ram.
Here is a short example program:
#include <cstdlib>
#include <algorithm>
#include <iostream>
#include <vector>
// A function returning an integer between [0..99]
int get_number()
{
return rand() % 100;
}
int main(int argc, char* argv[])
{
// reserves one bucket for each possible integer
// and initialize to 0
std::vector<int> cnt_buckets(100, 0);
int nb_distincts = 0;
// Get 30 numbers and count distincts
for(int i=0; i<30; ++i)
{
int number = get_number();
std::cout << number << std::endl;
if(0 == cnt_buckets[number])
++ nb_distincts;
// We could optimize by doing this only the first time
++ cnt_buckets[number];
}
std::cerr << "Total distincts numbers: " << nb_distincts << std::endl;
}
You can see it working:
$ ./main | sort | uniq | wc -l
Total distincts numbers: 26
26
The simplest way is just to use std::set.
std::set<int> s;
int uniqueCount = 0;
for( int i = 0; i < 30; ++i )
{
int n = get_number();
if( s.find(n) != s.end() ) {
--uniqueCount;
continue;
}
s.insert( n );
}
// now s contains unique numbers
// and uniqueCount contains the number of unique integers returned
Using an array and sort seems good, but unique may be a bit overkill if you just need to count distinct values. The following function should return number of distinct values in a sorted range.
template<typename ForwardIterator>
size_t distinct(ForwardIterator begin, ForwardIterator end) {
if (begin == end) return 0;
size_t count = 1;
ForwardIterator prior = begin;
while (++begin != end)
{
if (*prior != *begin)
++count;
prior = begin;
}
return count;
}
In contrast to the set- or map-based approaches this one does not need any heap allocation and elements are stored continuously in memory, therefore it should be much faster. Asymptotic time complexity is O(N log N) which is the same as when using an associative container. I bet that even your original solution of using std::sort followed by std::unique would be much faster than using std::set.
Try a set, try an unordered set, try sort and unique, try something else that seems fun.
Then MEASURE each one. If you want the fastest implementation, there is no substitute for trying out real code and seeing what it really does.
Your particular platform and compiler and other particulars will surely matter, so test in an environment as close as possible to where it will be running in production.

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?
For example if I have a vector with 1 2 3 5, I need to get the vector that contains 0 4.
But I need to do it very efficiently.
Since the vector is already sorted, this becomes trivial:
vector<int> v = {1,2,3,5};
vector<int> ret;
v.push_back(n+1); // this is to enforce a limit using less branches in the loop
for(int i = 0, j = 0; i <= n; ++i){
int present = v[j++];
while(i < present){
ret.push_back(i++);
}
}
return ret;
Additionally, if it wasn't sorted, you could either sort it and apply the above algorithm, or, if you know the range of n, and you can afford the extra memory, you could instead create an array of boolean (or a bitset) and mark the index corresponding to every element you encounter (e.g. bitset[v[j++]] = true;), subsequently iterating from 0 to n and inserting into your vector every element whose bitset position has not been marked.
Basically the idea presented here is that we know the number of missing items beforehand if we can assume sorted input without duplicate values.
Then it is possible to pre-allocate enough space to hold the missing values beforehand (no later dynamic allocation required). Then we can also exploit the possible shortcut when all missing values were found.
If the input vector is not sorted or contains duplicate values, a wrapper function can be used that establishes this precondition.
#include <iostream>
#include <set>
#include <vector>
inline std::vector<int> find_missing(std::vector<int> const & input) {
// assuming non-empty, sorted input, no duplicates
// number of items missing
int n_missing = input.back() - input.size() + 1;
// pre-allocate enough memory for missing values
std::vector<int> result(n_missing);
// iterate input vector with shortcut if all missing values were found
auto input_it = input.begin();
auto result_it = result.begin();
for (int i = 0; result_it != result.end() && input_it != input.end(); ++i) {
if (i < *input_it) (*result_it++) = i;
else ++input_it;
}
return result;
}
// use this if the input vector is not sorted/unique
inline std::vector<int> find_missing_unordered(std::vector<int> const & input) {
std::set<int> values(input.begin(), input.end());
return find_missing(std::vector<int>(values.begin(), values.end()));
}
int main() {
std::vector<int> input = {1,2,3,5,5,5,7};
std::vector<int> result = find_missing_unordered(input);
for (int i : result)
std::cout << i << " ";
std::cout << "\n";
}
The output is:
$ g++ test.cc -std=c++11 && ./a.out
0 4 6