Time-efficient way to count number of distinct numbers

Time-efficient way to count number of distinct numbers - c++

get_number() returns an integer. I'm going to call it 30 times and count the number of distinct integers returned. My plan is to put these numbers into an std::array<int,30>, sort it and then use std::unique.
Is that a good solution? Is there a better one? This piece of code will be the bottleneck of my program.
I'm thinking there should be a hash-based solution, but maybe its overhead would be too much when I've only got 30 elements?
Edit I changed unique to distinct. Example:
{1,1,1,1} => 1
{1,2,3,4} => 4
{1,3,3,1} => 2

I would use std::set<int> as it's simpler:
std::set<int> s;
for(/*loop 30 times*/)
{
s.insert(get_number());
}
std::cout << s.size() << std::endl; // You get count of unique numbers
If you want to count return times of each unique number, I'd suggest map
std::map<int, int> s;
for(int i=0; i<30; i++)
{
s[get_number()]++;
}
cout << s.size() << std::endl; // total count of distinct numbers returned
for (auto it : s)
{
cout << it.first << " " << it.second<< std::endl; // each number and return counts
}

The simplest solution would be to use a std::map:
std::map<int, size_t> counters;
for (size_t i = 0; i != 30; ++i) {
counters[getNumber()] += 1;
}
std::vector<int> uniques;
for (auto const& pair: counters) {
if (pair.second == 1) { uniques.push_back(pair.first); }
}
// uniques now contains the items that only appeared once.

Using a std::map, std::set or the std::sort algorithm will give you a O(n*log(n)) complexity. For a small to large number of elements it is perfectly correct. But you use a known integer range and this opens the door to lot of optimizations.
As you say (in a comment) that the range of your integers is known and short: [0..99]. I would recommend to implement a modified counting sort. See: http://en.wikipedia.org/wiki/Counting_sort
You can count the number of distinct items while doing the sort itself, removing the need for the std::unique call. The whole complexity would be O(n). Another advantage is that the memory needed is independent of the number of input items. If you had 30.000.000.000 integers to sort, it would not need a single supplementary byte to count the distinct items.
Even is the range of allowed integer value is large, says [0..10.000.000] the memory consumed would be quite low. Indeed, an optimized version could consume as low as 1 bit per allowed integer value. That is less than 2 MB of memory or 1/1000th of a laptop ram.
Here is a short example program:
#include <cstdlib>
#include <algorithm>
#include <iostream>
#include <vector>
// A function returning an integer between [0..99]
int get_number()
{
return rand() % 100;
}
int main(int argc, char* argv[])
{
// reserves one bucket for each possible integer
// and initialize to 0
std::vector<int> cnt_buckets(100, 0);
int nb_distincts = 0;
// Get 30 numbers and count distincts
for(int i=0; i<30; ++i)
{
int number = get_number();
std::cout << number << std::endl;
if(0 == cnt_buckets[number])
++ nb_distincts;
// We could optimize by doing this only the first time
++ cnt_buckets[number];
}
std::cerr << "Total distincts numbers: " << nb_distincts << std::endl;
}
You can see it working:
$ ./main | sort | uniq | wc -l
Total distincts numbers: 26
26

The simplest way is just to use std::set.
std::set<int> s;
int uniqueCount = 0;
for( int i = 0; i < 30; ++i )
{
int n = get_number();
if( s.find(n) != s.end() ) {
--uniqueCount;
continue;
}
s.insert( n );
}
// now s contains unique numbers
// and uniqueCount contains the number of unique integers returned

Using an array and sort seems good, but unique may be a bit overkill if you just need to count distinct values. The following function should return number of distinct values in a sorted range.
template<typename ForwardIterator>
size_t distinct(ForwardIterator begin, ForwardIterator end) {
if (begin == end) return 0;
size_t count = 1;
ForwardIterator prior = begin;
while (++begin != end)
{
if (*prior != *begin)
++count;
prior = begin;
}
return count;
}
In contrast to the set- or map-based approaches this one does not need any heap allocation and elements are stored continuously in memory, therefore it should be much faster. Asymptotic time complexity is O(N log N) which is the same as when using an associative container. I bet that even your original solution of using std::sort followed by std::unique would be much faster than using std::set.

Try a set, try an unordered set, try sort and unique, try something else that seems fun.
Then MEASURE each one. If you want the fastest implementation, there is no substitute for trying out real code and seeing what it really does.
Your particular platform and compiler and other particulars will surely matter, so test in an environment as close as possible to where it will be running in production.

Related

Algorithm for creating an array of 5 unique integers between 1 and 20 [duplicate]

This question already has answers here:
Unique (non-repeating) random numbers in O(1)?
(22 answers)
Closed 1 year ago.
My goal is creating an array of 5 unique integers between 1 and 20. Is there a better algorithm than what I use below?
It works and I think it has a constant time complexity due to the loops not being dependent on variable inputs, but I want to find out if there is a more efficient, cleaner, or simpler way to write this.
int * getRandom( ) {
static int choices[5] = {};
srand((unsigned)time(NULL));
for (int i = 0; i < 5; i++) {
int generated = 1 + rand() % 20;
for (int j = 0; j < 5; j++){
if(choices[j] == generated){
i--;
}
}
choices[i] = generated;
cout << choices[i] << endl;
}
return choices;
}
Thank you so much for any feedback. I am new to algorithms.

The simplest I can think about is just create array of all 20 numbers, with choices[i] = i+1, shuffle them with std::random_shuffle and take 5 first elements. Might be slower, but hard to introduce bugs, and given small fixed size - might be fine.
BTW, your version has a bug. You execute line choices[i] = generated; even if you find the generated - which might create a copy of generated value. Say, i = 3, generated is equal to element at j = 0, now your decrement i and assign choices[2] - which becomes equal to choices[0].

C++17 code with explanation of why and what.
If you have any questions left don't hesitate to ask, I'm happy to help
#include <iostream>
#include <array>
#include <string>
#include <random>
#include <type_traits>
// container for random numbers.
// by putting the random numbers + generator inside a class
// we get better control over the lifecycle.
// e.g. what gets called when.
// Now we know the generation gets called at constructor time.
class integer_random_numbers
{
public:
// use std::size_t for things used in loops and must be >= 0
integer_random_numbers(std::size_t number, int minimum, int maximum)
{
// initialize the random generator to be trully random
// look at documentation for <random>, it is the C++ way for random numbers
std::mt19937 generator(std::random_device{}());
// make sure all numbers have an equal chance. range is inclusive
std::uniform_int_distribution<int> distribution(minimum, maximum);
// m_values is a std::vector, which is an array of which
// the length be resized at runtime.
for (auto n = 0; n < number; ++n)
{
int new_random_value{};
// generate unique number
do
{
new_random_value = distribution(generator);
} while (std::find(m_values.begin(), m_values.end(), new_random_value) != m_values.end());
m_values.push_back(new_random_value);
}
}
// give the class an array index operator
// so we can use it as an array later
int& operator[](const std::size_t index)
{
// use bounds checking from std::vector
return m_values.at(index);
}
// reutnr the number of numbers we generated
std::size_t size() const noexcept
{
return m_values.size();
}
private:
// use a vector, since we specify the size at runtime.
std::vector<int> m_values;
};
// Create a static instance of the class, this will
// run the constructor only once (at start of program)
static integer_random_numbers my_random_numbers{ 5, 1, 20 };
int main()
{
// And now we can use my_random_numbers as an array
for (auto n = 0; n < my_random_numbers.size(); ++n)
{
std::cout << my_random_numbers[n] << std::endl;
}
}

Generate 5 random numbers from 1 to 16, allowing duplicates
Sort them
Add 1 to the 2nd number, 2 to the 3rd, 3 to 4th, and 4 to the 5th.
The last step transforms the range from [1,16] to [1,20] by remapping the possible sequences with duplicates into sequences with unique integers. [1,2,10,10,16], for example, becomes [1,3,12,13,20]. The transformation is completely bijective, so you never need to discard and resample.

Efficient way of finding if a container contains duplicated values with STL? [duplicate]

I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:
// Check for duplicate numbers in user inputted data
int i; // Need to declare i here so that it can be accessed by the 'inner' loop that starts on line 21
for(i = 0;i < 6; i++) { // Check each other number in the array
for(int j = i; j < 6; j++) { // Check the rest of the numbers
if(j != i) { // Makes sure don't check number against itself
if(userNumbers[i] == userNumbers[j]) {
b = true;
}
}
if(b == true) { // If there is a duplicate, change that particular number
cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
cin >> userNumbers[i];
}
} // Comparison loop
b = false; // Reset the boolean after each number entered has been checked
} // Main check loop
It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.

You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.
std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
if (userNumbers[i] == userNumbers[i + 1]) {
userNumbers.erase(userNumbers.begin() + i);
i--;
}
}
I also second the reccomendation to use a std::set - no duplicates there.

The following solution is based on sorting the numbers and then removing the duplicates:
#include <algorithm>
int main()
{
int userNumbers[6];
// ...
int* end = userNumbers + 6;
std::sort(userNumbers, end);
bool containsDuplicates = (std::unique(userNumbers, end) != end);
}

Indeed, the fastest and as far I can see most elegant method is as advised above:
std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);
It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:
std::set<int> tTmp;
std::vector<int>::iterator tNewEnd =
std::remove_if(tUserNumbers.begin(), tUserNumbers.end(),
[&tTmp] (int pNumber) -> bool {
return (!tTmp.insert(pNumber).second);
});
tUserNumbers.erase(tNewEnd, tUserNumbers.end());
which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.
Cheers,
Paul

It is in extension to the answer by #Puppy, which is the current best answer.
PS : I tried to insert this post as comment in the current best answer by #Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.
Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.
bool hasDuplicate(vector<int>& nums) {
size_t count = nums.size();
if (!count)
return false;
std::unordered_map<int, int> tbl;
//std::set<int> tbl;
for (size_t i = 0; i < count; i++) {
if (tbl.find(nums[i]) != tbl.end())
return true;
tbl[nums[i]] = 1;
//tbl.insert(nums[i]);
}
return false;
}
unordered_map Performance (Run time was 52 ms here)
Set/Map Performance

You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.

I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)
/*
============================
Foo
============================
*
Takes in a read only unsigned int. A table is created to store counters
for each digit. If any digit's counter is flipped higher than 1, function
returns. For example, with 48778584:
0 1 2 3 4 5 6 7 8 9
[0] [0] [0] [0] [2] [1] [0] [2] [2] [0]
When we iterate over this array, we find that 4 is duplicated and immediately
return false.
*/
bool Foo(int number)
{
int temp = number;
int digitTable[10]={0};
while(temp > 0)
{
digitTable[temp % 10]++; // Last digit's respective index.
temp /= 10; // Move to next digit
}
for (int i=0; i < 10; i++)
{
if (digitTable [i] > 1)
{
return false;
}
}
return true;
}

It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.
Some small corrections for your code:
Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
You should't need to declare i variable outside the for statement.

I think #Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)
template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::unordered_set<T> values(begin, end);
std::size_t size = std::distance(begin,end);
return size != values.size();
}

//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());
//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();
//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));

#include<iostream>
#include<algorithm>
int main(){
int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
int len = sizeof(arr) / sizeof(*arr); // Finding length of array
std::sort(arr, arr+len);
int unique_elements = std::unique(arr, arr+len) - arr;
if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
else std::cout << "Duplicate number present in this array\n";
return 0;
}

As mentioned by #underscore_d, an elegant and efficient solution would be,
#include <algorithm>
#include <vector>
template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::vector<T> values(begin, end);
std::sort(values.begin(), values.end());
return (std::adjacent_find(values.begin(), values.end()) != values.end());
}
int main() {
int user_ids[6];
// ...
std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}

fast O(N) time and space solution
return first when it hits duplicate
template <typename T>
bool containsDuplicate(vector<T>& items) {
return any_of(items.begin(), items.end(), [s = unordered_set<T>{}](const auto& item) mutable {
return !s.insert(item).second;
});
}

Not enough karma to post a comment. Hence a post.
vector <int> numArray = { 1,2,1,4,5 };
unordered_map<int, bool> hasDuplicate;
bool flag = false;
for (auto i : numArray)
{
if (hasDuplicate[i])
{
flag = true;
break;
}
else
hasDuplicate[i] = true;
}
(flag)?(cout << "Duplicate"):("No duplicate");

Find equals value into an array in c++

There is a faster way to find equals value into an array instead of comparing all elements one by one with all the array's elements ?
for(int i = 0; i < arrayLenght; i ++)
{
for(int k = i; k < arrayLenght; i ++)
{
if(array[i] == array[k])
{
sprintf(message,"There is a duplicate of %s",array[i]);
ShowMessage(message);
break;
}
}
}

Since sorting your container is a possible solution, std::unique is the simplest solution to your problem:
std::vector<int> v {0,1,0,1,2,0,1,2,3};
std::sort(begin(v), end(v));
v.erase(std::unique(begin(v), end(v)), end(v));
First, the vector is sorted. You can use anything, std::sort is just the simplest. After that, std::unique shifts the duplicates to the end of the container and returns an iterator to the first duplicate. This is then eaten by erase and effectively removes those from the vector.

You could use std::multiset and then count duplicates afterwards like this:
#include <iostream>
#include <set>
int main()
{
const int arrayLenght = 14;
int array[arrayLenght] = { 0,2,1,3,1,4,5,5,5,2,2,3,5,5 };
std::multiset<int> ms(array, array + arrayLenght);
for (auto it = ms.begin(), end = ms.end(); it != end; it = ms.equal_range(*it).second)
{
int cnt = 0;
if ((cnt = ms.count(*it)) > 1)
std::cout << "There are " << cnt << " of " << *it << std::endl;
}
}
https://ideone.com/6ktW89
There are 2 of 1
There are 3 of 2
There are 2 of 3
There are 5 of 5

If your value_type of this array could be sorted by operator <(a strict weak order) it's a good choice to do as YSC answered.
If not,maybe you can try to define a hash function to hash the objects to different values.Then you can do this in O(n) time complexity,like:
struct ValueHash
{
size_t operator()(const Value& rhs) const{
//do_something
}
};
struct ValueCmp
{
bool operator()(const Value& lhs, const Value& rhs) const{
//do_something
}
};
unordered_set<Value,ValueHash,ValueCmp> myset;
for(int i = 0; i < arrayLenght; i ++)
{
if(myset.find(array[i])==myset.end())
myset.insert(array[i]);
else
dosomething();
}

In case you have a large amount of data, you can first sort the array (quick sort gives you a first pass in O(n*log(n))) and then do a second pass by comparing each value with the next (as they might be all together) to find duplicates (this is a sequential pass in O(n)) so, sorting in a first pass and searching the sorted array for duplicates gives you O(n*log(n) + n), or finally O(n*log(n)).
EDIT
An alternative has been suggested in the comments, of using a std::set to check for already processed data. The algorithm just goes element by element, checking if the element has been seen before. This can lead to a O(n) algorithm, but only if you take care of using a hash set. In case you use a sorted set, then you incur in an O(log(n)) for each set search and finish in the same O(n*log(n)). But because the proposal can be solved with a hash set (you have to be careful in selecting an std::unsorted_set, so you don't get the extra access time per search) you get a final O(n). Of course, you have to account for possible automatic hash table grow or a huge waste of memory used in the hash table.
Thanks to #freakish, who pointed the set solution in the comments to the question.

How to find a unique number using std::find

Hey here is a trick question asked in class today, I was wondering if there is a way to find a unique number in a array, The usual method is to use two for loops and get the unique number which does not match with all the others I am using std::vectors for my array in C++ and was wondering if find could spot the unique number as I wouldn't know where the unique number is in the array.

Assuming that we know that the vector has at least three
elements (because otherwise, the question doesn't make sense),
just look for an element different from the first. If it
happens to be the second, of course, we have to check the third
to see whether it was the first or the second which is unique,
which means a little extra code, but roughly:
std::vector<int>::const_iterator
findUniqueEntry( std::vector<int>::const_iterator begin,
std::vector<int>::const_iterator end )
{
std::vector<int>::const_iterator result
= std::find_if(
next( begin ), end, []( int value) { return value != *begin );
if ( result == next( begin ) && *result == *next( result ) ) {
-- result;
}
return result;
}
(Not tested, but you get the idea.)

As others have said, sorting is one option. Then your unique value(s) will have a different value on either side.
Here's another option that solves it, using std::find, in O(n^2) time(one iteration of the vector, but each iteration iterates through the whole vector, minus one element.) - sorting not required.
vector<int> findUniques(vector<int> values)
{
vector<int> uniqueValues;
vector<int>::iterator begin = values.begin();
vector<int>::iterator end = values.end();
vector<int>::iterator current;
for(current = begin ; current != end ; current++)
{
int val = *current;
bool foundBefore = false;
bool foundAfter = false;
if (std::find(begin, current, val) != current)
{
foundBefore = true;
}
else if (std::find(current + 1, end, val) != end)
{
foundAfter = true;
}
if(!foundBefore && !foundAfter)
uniqueValues.push_back(val);
}
return uniqueValues;
}
Basically what is happening here, is that I am running ::find on the elements in the vector before my current element, and also running ::find on the elements after my current element. Since my current element already has the value stored in 'val'(ie, it's in the vector once already), if I find it before or after the current value, then it is not a unique value.
This should find all values in the vector that are not unique, regardless of how many unique values there are.
Here's some test code to run it and see:
void printUniques(vector<int> uniques)
{
vector<int>::iterator it;
for(it = uniques.begin() ; it < uniques.end() ; it++)
{
cout << "Unique value: " << *it << endl;
}
}
void WaitForKey()
{
system("pause");
}
int main()
{
vector<int> values;
for(int i = 0 ; i < 10 ; i++)
{
values.push_back(i);
}
/*for(int i = 2 ; i < 10 ; i++)
{
values.push_back(i);
}*/
printUniques(findUniques(values));
WaitForKey();
return -13;
}
As an added bonus:
Here's a version that uses a map, does not use std::find, and gets the job done in O(nlogn) time - n for the for loop, and log(n) for map::find(), which uses a red-black tree.
map<int,bool> mapValues(vector<int> values)
{
map<int, bool> uniques;
for(unsigned int i = 0 ; i < values.size() ; i++)
{
uniques[values[i]] = (uniques.find(values[i]) == uniques.end());
}
return uniques;
}
void printUniques(map<int, bool> uniques)
{
cout << endl;
map<int, bool>::iterator it;
for(it = uniques.begin() ; it != uniques.end() ; it++)
{
if(it->second)
cout << "Unique value: " << it->first << endl;
}
}
And an explanation. Iterate over all elements in the vector<int>. If the current member is not in the map, set its value to true. If it is in the map, set the value to false. Afterwards, all values that have the value true are unique, and all values with false have one or more duplicates.

If you have more than two values (one of which has to be unique), you can do it in O(n) in time and space by iterating a first time through the array and filling a map that has as a key the value, and value the number of occurences of the key.
Then you just have to iterate through the map in order to find a value of 1. That would be a unique number.

This example uses a map to count number occurences. Unique number will be seen only one time:
#include <iostream>
#include <map>
#include <vector>
int main ()
{
std::map<int,int> mymap;
std::map<int,int>::iterator mit;
std::vector<int> v;
std::vector<int> myunique;
v.push_back(10); v.push_back(10);
v.push_back(20); v.push_back(30);
v.push_back(40); v.push_back(30);
std::vector<int>::iterator vit;
// count occurence of all numbers
for(vit=v.begin();vit!=v.end();++vit)
{
int number = *vit;
mit = mymap.find(number);
if( mit == mymap.end() )
{
// there's no record in map for your number yet
mymap[number]=1; // we have seen it for the first time
} else {
mit->second++; // thiw one will not be unique
}
}
// find the unique ones
for(mit=mymap.begin();mit!=mymap.end();++mit)
{
if( mit->second == 1 ) // this was seen only one time
{
myunique.push_back(mit->first);
}
}
// print out unique numbers
for(vit=myunique.begin();vit!=myunique.end();++vit)
std::cout << *vit << std::endl;
return 0;
}
Unique numbers in this example are 20 and 40. There's no need for the list to be ordered for this algorithm.

Do you mean to find a number in a vector which appears only once? The nested loop if the easy solution. I don't think std::find or std::find_if is very useful here. Another option is to sort the vector so that you only need to find two consecutive numbers that are different. It seems overkill, but it is actually O(nlogn) instead of O(n^2) as the nested loop:
void findUnique(const std::vector<int>& v, std::vector<int> &unique)
{
if(v.size() <= 1)
{
unique = v;
return;
}
unique.clear();
vector<int> w = v;
std::sort(w.begin(), w.end());
if(w[0] != w[1]) unique.push_back(w[0]);
for(size_t i = 1; i < w.size(); ++i)
if(w[i-1] != w[i]) unique.push_back(w[i]);
// unique contains the numbers that are not repeated
}

Assuming you are given an array size>=3 which contains one instance of value A, and all other values are B, then you can do this with a single for loop.
int find_odd(int* array, int length) {
// In the first three elements, we are guaranteed to have 2 common ones.
int common=array[0];
if (array[1]!=common && array[2]!=common)
// The second and third elements are the common one, and the one we thought was not.
return common;
// Now search for the oddball.
for (int i=0; i<length; i++)
if (array[i]!=common) return array[i];
}
EDIT:
K what if more than 2 in an array of 5 are different? – super
Ah... that is a different problem. So you have an array of size n, which contains the common element c more than once, and all other elements exactly once. The goal is to find the set of non-common (i.e. unique) elements right?
Then you need to look at Sylvain's answer above. I think he was answering a different question, but it would work for this. At the end, you will have a hash map full of the counts of each value. Loop through the hash map, and every time you see a value of 1, you will know the key is a unique value in the input array.

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?
For example if I have a vector with 1 2 3 5, I need to get the vector that contains 0 4.
But I need to do it very efficiently.

Since the vector is already sorted, this becomes trivial:
vector<int> v = {1,2,3,5};
vector<int> ret;
v.push_back(n+1); // this is to enforce a limit using less branches in the loop
for(int i = 0, j = 0; i <= n; ++i){
int present = v[j++];
while(i < present){
ret.push_back(i++);
}
}
return ret;
Additionally, if it wasn't sorted, you could either sort it and apply the above algorithm, or, if you know the range of n, and you can afford the extra memory, you could instead create an array of boolean (or a bitset) and mark the index corresponding to every element you encounter (e.g. bitset[v[j++]] = true;), subsequently iterating from 0 to n and inserting into your vector every element whose bitset position has not been marked.

Basically the idea presented here is that we know the number of missing items beforehand if we can assume sorted input without duplicate values.
Then it is possible to pre-allocate enough space to hold the missing values beforehand (no later dynamic allocation required). Then we can also exploit the possible shortcut when all missing values were found.
If the input vector is not sorted or contains duplicate values, a wrapper function can be used that establishes this precondition.
#include <iostream>
#include <set>
#include <vector>
inline std::vector<int> find_missing(std::vector<int> const & input) {
// assuming non-empty, sorted input, no duplicates
// number of items missing
int n_missing = input.back() - input.size() + 1;
// pre-allocate enough memory for missing values
std::vector<int> result(n_missing);
// iterate input vector with shortcut if all missing values were found
auto input_it = input.begin();
auto result_it = result.begin();
for (int i = 0; result_it != result.end() && input_it != input.end(); ++i) {
if (i < *input_it) (*result_it++) = i;
else ++input_it;
}
return result;
}
// use this if the input vector is not sorted/unique
inline std::vector<int> find_missing_unordered(std::vector<int> const & input) {
std::set<int> values(input.begin(), input.end());
return find_missing(std::vector<int>(values.begin(), values.end()));
}
int main() {
std::vector<int> input = {1,2,3,5,5,5,7};
std::vector<int> result = find_missing_unordered(input);
for (int i : result)
std::cout << i << " ";
std::cout << "\n";
}
The output is:
$ g++ test.cc -std=c++11 && ./a.out
0 4 6

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Time-efficient way to count number of distinct numbers - c++

Related

Algorithm for creating an array of 5 unique integers between 1 and 20 [duplicate]

Efficient way of finding if a container contains duplicated values with STL? [duplicate]

Find equals value into an array in c++

How to find a unique number using std::find

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?

Categories

Resources