Find which numbers appears most in a vector

Find which numbers appears most in a vector - c++

I have some numbers stored in a std::vector<int>. I want to find which number appears most in the vector.
e.g. in the vector
1 3 4 3 4 2 1 3 2 3
the element that occurs the most is 3.
Is there any algorithm (STL or whatever) that does this ?

Sort it, then iterate through it and keep a counter that you increment when the current number is the same as the previous number and reset to 0 otherwise. Also keep track of what was the highest value of the counter thus far and what the current number was when that value was reached. This solution is O(n log n) (because of the sort).
Alternatively you can use a hashmap from int to int (or if you know the numbers are within a limited range, you could just use an array) and iterate over the vector, increasing the_hashmap[current_number] by 1 for each number. Afterwards iterate through the hashmap to find its largest value (and the key belonging to it). This requires a hashmap datastructure though (unless you can use arrays which will also be faster), which isn't part of STL.

If you want to avoid sorting your vector v, use a map:
int max = 0;
int most_common = -1;
map<int,int> m;
for (vi = v.begin(); vi != v.end(); vi++) {
m[*vi]++;
if (m[*vi] > max) {
max = m[*vi];
most_common = *vi;
}
}
This requires more memory and has a very similar expected runtime. The memory required should be on the order of a full vector copy, less if there are many duplicate entries.

Try this
int FindMode(vector<int> value)
{
int index = 0;
int highest = 0;
for (unsigned int a = 0; a < value.size(); a++)
{
int count = 1;
int Position = value.at(a);
for (unsigned int b = a + 1; b < value.size(); b++)
{
if (value.at(b) == Position)
{
count++;
}
}
if (count >= index)
{
index = count;
highest = Position;
}
}
return highest;
}

This is how i did it:
int max=0,mostvalue=a[0];
for(i=0;i<a.size();i++)
{
co = (int)count(a.begin(), a.end(), a[i]);
if(co > max)
{ max = co;
mostvalue = a[i];
}
}
I just don't know how fast it is, i.e. O() ? If someone could calculate it and post it here that would be fine.

Here is an O(n) generic solution for finding the most common element in an iterator range. You use it simply by doing:
int commonest = most_common(my_vector.begin(), my_vector.end());
The value type is extracted from the iterator using iterator_traits<>.
template<class InputIt, class T = typename std::iterator_traits<InputIt>::value_type>
T most_common(InputIt begin, InputIt end)
{
std::map<T, int> counts;
for (InputIt it = begin; it != end; ++it) {
if (counts.find(*it) != counts.end()) {
++counts[*it];
}
else {
counts[*it] = 1;
}
}
return std::max_element(counts.begin(), counts.end(),
[] (const std::pair<T, int>& pair1, const std::pair<T, int>& pair2) {
return pair1.second < pair2.second;})->first;
}

Related

Efficient way of finding if a container contains duplicated values with STL? [duplicate]

I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:
// Check for duplicate numbers in user inputted data
int i; // Need to declare i here so that it can be accessed by the 'inner' loop that starts on line 21
for(i = 0;i < 6; i++) { // Check each other number in the array
for(int j = i; j < 6; j++) { // Check the rest of the numbers
if(j != i) { // Makes sure don't check number against itself
if(userNumbers[i] == userNumbers[j]) {
b = true;
}
}
if(b == true) { // If there is a duplicate, change that particular number
cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
cin >> userNumbers[i];
}
} // Comparison loop
b = false; // Reset the boolean after each number entered has been checked
} // Main check loop
It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.

You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.
std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
if (userNumbers[i] == userNumbers[i + 1]) {
userNumbers.erase(userNumbers.begin() + i);
i--;
}
}
I also second the reccomendation to use a std::set - no duplicates there.

The following solution is based on sorting the numbers and then removing the duplicates:
#include <algorithm>
int main()
{
int userNumbers[6];
// ...
int* end = userNumbers + 6;
std::sort(userNumbers, end);
bool containsDuplicates = (std::unique(userNumbers, end) != end);
}

Indeed, the fastest and as far I can see most elegant method is as advised above:
std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);
It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:
std::set<int> tTmp;
std::vector<int>::iterator tNewEnd =
std::remove_if(tUserNumbers.begin(), tUserNumbers.end(),
[&tTmp] (int pNumber) -> bool {
return (!tTmp.insert(pNumber).second);
});
tUserNumbers.erase(tNewEnd, tUserNumbers.end());
which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.
Cheers,
Paul

It is in extension to the answer by #Puppy, which is the current best answer.
PS : I tried to insert this post as comment in the current best answer by #Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.
Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.
bool hasDuplicate(vector<int>& nums) {
size_t count = nums.size();
if (!count)
return false;
std::unordered_map<int, int> tbl;
//std::set<int> tbl;
for (size_t i = 0; i < count; i++) {
if (tbl.find(nums[i]) != tbl.end())
return true;
tbl[nums[i]] = 1;
//tbl.insert(nums[i]);
}
return false;
}
unordered_map Performance (Run time was 52 ms here)
Set/Map Performance

You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.

I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)
/*
============================
Foo
============================
*
Takes in a read only unsigned int. A table is created to store counters
for each digit. If any digit's counter is flipped higher than 1, function
returns. For example, with 48778584:
0 1 2 3 4 5 6 7 8 9
[0] [0] [0] [0] [2] [1] [0] [2] [2] [0]
When we iterate over this array, we find that 4 is duplicated and immediately
return false.
*/
bool Foo(int number)
{
int temp = number;
int digitTable[10]={0};
while(temp > 0)
{
digitTable[temp % 10]++; // Last digit's respective index.
temp /= 10; // Move to next digit
}
for (int i=0; i < 10; i++)
{
if (digitTable [i] > 1)
{
return false;
}
}
return true;
}

It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.
Some small corrections for your code:
Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
You should't need to declare i variable outside the for statement.

I think #Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)
template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::unordered_set<T> values(begin, end);
std::size_t size = std::distance(begin,end);
return size != values.size();
}

//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());
//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();
//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));

#include<iostream>
#include<algorithm>
int main(){
int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
int len = sizeof(arr) / sizeof(*arr); // Finding length of array
std::sort(arr, arr+len);
int unique_elements = std::unique(arr, arr+len) - arr;
if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
else std::cout << "Duplicate number present in this array\n";
return 0;
}

As mentioned by #underscore_d, an elegant and efficient solution would be,
#include <algorithm>
#include <vector>
template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::vector<T> values(begin, end);
std::sort(values.begin(), values.end());
return (std::adjacent_find(values.begin(), values.end()) != values.end());
}
int main() {
int user_ids[6];
// ...
std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}

fast O(N) time and space solution
return first when it hits duplicate
template <typename T>
bool containsDuplicate(vector<T>& items) {
return any_of(items.begin(), items.end(), [s = unordered_set<T>{}](const auto& item) mutable {
return !s.insert(item).second;
});
}

Not enough karma to post a comment. Hence a post.
vector <int> numArray = { 1,2,1,4,5 };
unordered_map<int, bool> hasDuplicate;
bool flag = false;
for (auto i : numArray)
{
if (hasDuplicate[i])
{
flag = true;
break;
}
else
hasDuplicate[i] = true;
}
(flag)?(cout << "Duplicate"):("No duplicate");

Maintain an unordered_map but at the same time need the lowest of it's mapped values at every step

I have an unordered_map<int, int> which is updated at every step of a for loop. But at the end of the loop, I also need the lowest of the mapped values. Traversing it to find the minimum in O(n) is too slow. I know there exists MultiIndex container in boost but I can't use boost. What is the simplest way it can be done using only STL?
Question:
Given an array A of positive integers, call a (contiguous, not
necessarily distinct) subarray of A good if the number of different
integers in that subarray is exactly K.
(For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.)
Return the number of good subarrays of A.
My code:
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
unordered_map<int, int> M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M[A[right]] = right;
if (right == A.size())
return 0;
int smallest, count;
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count = smallest - left + 1;
for (; right < A.size(); ++right)
{
M[A[right]] = right;
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count += smallest - left + 1;
}
return count;
}
};
Link to the question: https://leetcode.com/problems/subarrays-with-k-different-integers/

O(n) is not slow, in fact it is the theoretically fastest possible way to find the minimum, as it's obviously not possible to find the minimum of n items without actually considering each of them.
You could update the minimum during the loop, which is trivial if the loop only adds new items to the map but becomes much harder if the loop may change existing items (and may increase the value of the until-then minimum item!), but ultimately, this also adds O(n) amount of work, or more, so complexity-wise, it's not different from doing an extra loop at the end (obviously, the constant can be different - the extra loop may be slower than reusing the original loop, but the complexity is the same).
As you said, there are data structures that make it more efficient (O(log n) or even O(1)) to retrieve the minimum item, but at the cost of increased complexity to maintain this data structure during insertion. These data structures only make sense if you frequently need to check the minimum item while inserting or changing items - not if you only need to know the minimum only at the end of the loop, as you described.

I made a simple class to make it work although it's far from perfect, it's good enough for the above linked question.
class BiMap
{
public:
void insert(int key, int value)
{
auto itr = M.find(key);
if (itr == M.cend())
M.emplace(key, S.insert(value).first);
else
{
S.erase(itr->second);
M[key] = S.insert(value).first;
}
}
void erase(int key)
{
auto itr = M.find(key);
S.erase(itr->second);
M.erase(itr);
}
int operator[] (int key)
{
return *M.find(key)->second;
}
int size()
{
return M.size();
}
int minimum()
{
return *S.cbegin();
}
private:
unordered_map<int, set<int>::const_iterator> M;
set<int> S;
};
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
BiMap M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M.insert(A[right], right);
if (right == A.size())
return 0;
int count = M.minimum() - left + 1;
for (; right < A.size(); ++right)
{
M.insert(A[right], right);
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
count += M.minimum() - left + 1;
}
return count;
}
};

C++: Get K smallest elements+indices from vector with ties

The task is to extract k smallest elements and their indices from double array, possibly including more elements that are tied to the k-th smallest one. E.g.:
input: {3.3,1.1,6.5,4.2,1.1,3.3}
output (k=3): {1,1.1} {4,1.1} {0,3.3} {5,3.3}
[This seems like a pretty common task, but I couldn't find a similar thread on SO - which handles ties. Hopefully, I didn't miss any and didn't duplicate the question.]
I came up with the following solution, which works and seems to be fairly efficient complexity-wise. E.g. for random 1MLN doubles and k=10 it takes ~40ms with MSVC 2013. I wonder if there's a better/cleaner/more efficient(for large data and/or large k) way to perform this task (validations for k value and similar things are our of scope here). Avoid allocating the queue with all elements? Make use of std::partial_sum or std::nth_element?
typedef std::pair<double, int> idx_pair;
typedef std::priority_queue<idx_pair, std::vector<idx_pair>, std::greater<idx_pair>> idx_queue;
std::vector<idx_pair> getKSmallest(std::vector<double> const& data, int k)
{
idx_queue q;
{
std::vector<idx_pair> idxPairs(data.size());
for (auto i = 0; i < data.size(); i++)
idxPairs[i] = idx_pair(data[i], i);
q = idx_queue(std::begin(idxPairs), std::end(idxPairs));
};
std::vector<idx_pair> result;
auto topPop = [&q, &result]()
{
result.push_back(q.top());
q.pop();
};
for (auto i = 0; i < k; i++)
topPop();
auto const largest = result.back().first;
while (q.empty() == false)
{
if (q.top().first == largest)
topPop();
else
break;
}
return result;
}
Working example is here.

Here's an alternative solution, suggested by #piotrekg2 - using nth_element with average O(N) complexity:
bool equal(double value1, double value2)
{
return value1 == value2 || std::abs(value2 - value1) <= std::numeric_limits<double>::epsilon();
}
std::vector<idx_pair> getNSmallest(std::vector<double> const& data, int n)
{
std::vector<idx_pair> idxPairs(data.size());
for (auto i = 0; i < data.size(); i++)
idxPairs[i] = idx_pair(data[i], i);
std::nth_element(std::begin(idxPairs), std::begin(idxPairs) + n, std::end(idxPairs));
std::vector<idx_pair> result(std::begin(idxPairs), std::begin(idxPairs) + n);
auto const largest = result.back().first;
for (auto it = std::begin(idxPairs) + n; it != std::end(idxPairs); ++it)
if (equal(it->first, largest))
result.push_back(*it);
return result;
}
Indeed, the code looks a bit cleaner. However, I've run some tests and empirically this solution is slightly slower than the original one with std::priority_queue.
Note: The answer below by Petar offers a similar solution using std::nth_element, which in my experiments, performs slightly better than this one and also better than the solution using std::priority_queue - perhaps because of eliminating the operation on pairs and working with primitive doubles instead.

As pointed out by asker, I will suggest first copy the vector of double and use a nth_element to find out the kth element.
Then do a linear scan and get the elements that are smaller than or equal to the kth element. The Time complexity should be linear.
However, it should be careful when comparing double.
vector<idx_pair> getKSmallest(vector<double> const& data, int k){
vector<double> data_copy = data;
nth_element(data_copy.begin(), data_copy.begin() + k, data_copy.end());
vector<idx_pair> result;
double kth_element = data_copy[k - 1];
for (int i = 0; i < data.size(); i++)
if (data[i] <= kth_element)
result.push_back({i, data[i]});
return result;
}
update: It is also possible to find the kth_element by maintaing a max heap with size at most k.
It only need O(k) memory for heap instead of O(n) memory in the nth_element method.
It needs O(n log k) time but if k is small then i think it should be comparable to O(n) method.
I am not sure about it but my reason are the heap may be cached and you don't need to spend time for copying data.
vector<idx_pair> getKSmallest(vector<double> const& data, int k)
{
priority_queue<double> pq;
for (auto d : data){
if (pq.size() >= k && pq.top() > d){
pq.push(d)
pq.pop();
}
else if (pq.size() < k)
pq.push(d);
}
double kth_element = pq.top();
vector<idx_pair> result;
for (int i = 0; i < data.size(); i++)
if (data[i] <= kth_element)
result.push_back({i, data[i]});
return result;
}

I need a std function which checks how many elements occur exactly once in vector

Is there any STL function which does this?
For vector:
4 4 5 5 6 7
The expected output should be 2,because of one 6 and 7
Would you be kind to help me count them classic if there is no STL function?

I don't think there is an algorithm for that in STL. You can copy into a multimap or use a map of frequencies as suggested, but it does extra work that's not necessary because your array happens to be sorted. Here is a simple algorithm that counts the number of singular elements i.e. elements that appear only once in a sorted sequence.
int previous = v.front();
int current_count = 0;
int total_singular = 0;
for(auto n : v) {
if(previous == n) // check if same as last iteration
current_count++; // count the elements equal to current value
else {
if(current_count == 1) // count only those that have one copy for total
total_singular++;
previous = n;
current_count = 1; // reset counter, because current changed
}
}
if(current_count == 1) // check the last number
total_singular++;
You could also use count_if with a stateful lambda, but I don't think it'll make the algorithm any simpler.

If performance and memory doesn't matter to you, use std::map (or unordered version) for this task:
size_t count(const std::vector<int>& vec){
std::map<int,unsigned int> occurenceMap;
for (auto i : vec){
occurenceMap[i]++;
}
size_t count = 0U;
for (const auto& pair : occurenceMap){
if (pair.second == 1U){
count++;
}
}
return count;
}
with templates, it can be generalize to any container type and any containee type.

Use std::unique to count the unique entries(ct_u) and then user vector count on the original one (ct_o). Difference ct_o-ct_u would give the answer.
P.S.: this will only work if the identical entries are together in the original vector. If not, you may want to sort the vector first.

Using algorithm:
std::size_t count_unique(const std::vector<int>& v)
{
std::size_t count = 0;
for (auto it = v.begin(); it != v.end(); )
{
auto it2 = std::find_if(it + 1, v.end(), [&](int e) { return e != *it; });
count += (it2 - it == 1);
it = it2;
}
return count;
}
Demo

C++ Sort based on other int array

suppose i have two vector
std::vector<int>vec_int = {4,3,2,1,5};
std::vector<Obj*>vec_obj = {obj1,obj2,obj3,obj4,obj5};
How do we sort vec_obj in regard of sorted vec_int position?
So the goal may look like this:
std::vector<int>vec_int = {1,2,3,4,5};
std::vector<Obj*>vec_obj = {obj4,obj3,obj2,obj1,obj5};
I've been trying create new vec_array:
for (int i = 0; i < vec_int.size(); i++) {
new_vec.push_back(vec_obj[vec_int[i]]);
}
But i think it's not the correct solution. How do we do this? thanks
std library may be the best solution,but i can't find the correct solution to implement std::sort

You don't have to call std::sort, what you need can be done in linear time (provided the indices are from 1 to N and not repeating)
std::vector<Obj*> new_vec(vec_obj.size());
for (size_t i = 0; i < vec_int.size(); ++i) {
new_vec[i] = vec_obj[vec_int[i] - 1];
}
But of course for this solution you need the additional new_vec vector.
If the indices are arbitrary and/or you don't want to allocate another vector, you have to use a different data structure:
typedef pair<int, Obj*> Item;
vector<Item> vec = {{4, obj1}, {3, obj2}, {2, obj3}, {1, obj4}, {5, obj5}};
std::sort(vec.begin(), vec.end(), [](const Item& l, const Item& r) -> bool {return l.first < r.first;});

Maybe there is a better solution, but personally I would use the fact that items in a std::map are automatically sorted by key. This gives the following possibility (untested!)
// The vectors have to be the same size for this to work!
if( vec_int.size() != vec_obj.size() ) { return 0; }
std::vector<int>::const_iterator intIt = vec_int.cbegin();
std::vector<Obj*>::const_iterator objIt = vec_obj.cbegin();
// Create a temporary map
std::map< int, Obj* > sorted_objects;
for(; intIt != vec_int.cend(); ++intIt, ++objIt )
{
sorted_objects[ *intIt ] = *objIt;
}
// Iterating through map will be in order of key
// so this adds the items to the vector in the desired order.
std::vector<Obj*> vec_obj_sorted;
for( std::map< int, Obj* >::const_iterator sortedIt = sorted_objects.cbegin();
sortedIt != sorted_objects.cend(); ++sortedIt )
{
vec_obj_sorted.push_back( sortedIt->second );
}

[Not sure this fits your usecase, but putting the elements into a map will store the elements sorted by key by default.]
Coming to your precise solution if creation of the new vector is the issue you can avoid this using a simple swap trick (like selection sort)
//Place ith element in its place, while swapping to its position the current element.
for (int i = 0; i < vec_int.size(); i++) {
if (vec_obj[i] != vec_obj[vec_int[i])
swap_elements(i,vec_obj[i],vec_obj[vec_int[i]])
}

The generic form of this is known as "reorder according to", which is a variation of cycle sort. Unlike your example, the index vector needs to have the values 0 through size-1, instead of {4,3,2,1,5} it would need to be {3,2,1,0,4} (or else you have to adjust the example code below). The reordering is done by rotating groups of elements according to the "cycles" in the index vector or array. (In my adjusted example there are 3 "cycles", 1st cycle: index[0] = 3, index[3] = 0. 2nd cycle: index[1] = 2, index[2] = 1. 3rd cycle index[4] = 4). The index vector or array is also sorted in the process. A copy of the original index vector or array can be saved if you want to keep the original index vector or array. Example code for reordering vA according to vI in template form:
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vI)
{
size_t i, j, k;
T t;
for(i = 0; i < vA.size(); i++){
if(i != vI[i]){
t = vA[i];
k = i;
while(i != (j = vI[k])){
// every move places a value in it's final location
vA[k] = vA[j];
vI[k] = k;
k = j;
}
vA[k] = t;
vI[k] = k;
}
}
}
Simple still would be to copy vA to another vector vB according to vI:
for(i = 0; i < vA.size(); i++){
vB[i] = vA[vI[i]];

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find which numbers appears most in a vector - c++

I have some numbers stored in a std::vector<int>. I want to find which number appears most in the vector. e.g. in the vector 1 3 4 3 4 2 1 3 2 3 the element that occurs the most is 3. Is there any algorithm (STL or whatever) that does this ?

This is how i did it: int max=0,mostvalue=a[0]; for(i=0;i<a.size();i++) { co = (int)count(a.begin(), a.end(), a[i]); if(co > max) { max = co; mostvalue = a[i]; } } I just don't know how fast it is, i.e. O() ? If someone could calculate it and post it here that would be fine.

Related

Efficient way of finding if a container contains duplicated values with STL? [duplicate]

Maintain an unordered_map but at the same time need the lowest of it's mapped values at every step

C++: Get K smallest elements+indices from vector with ties

I need a std function which checks how many elements occur exactly once in vector

C++ Sort based on other int array

Categories

Resources