Get the number of duplicated elements in a set - c++

So a set doesn't allow duplicates, but is there a way, or another data structure, that can allow me to get the number of repeated elements even though they have been removed?. Let me explain myself better anyways.
Lets say I'm giveng this input:
[1, 2, 2, 3, 2, 5, 3]
If I put it in a set, it will end up like this:
[1, 2, 3, 5]
Which is what I want, but how can I know that there were three 2s before they were removed? Isn't this related to those data structure with "buckets" or something?
Basically I'd like the output to be something like this:
[1, 2, 3, 5]
| | | |
[1, 3, 2, 1]
With the bottom array being the number of duplicates of each element on the top array.

You can use a std::map to count the frequency of the items.
For example:
int arr[] = {1, 2, 2, 3, 2, 5, 3};
std::map<int, int> count;
for (int i = 0; i < 7; i++) {
count[arr[i]]++;
}
for (auto& [element, frequency] : count) {
std::cout << element << " : " << frequency << endl;
}
The output would be something like this:
1 : 1
2 : 3
3 : 2
5 : 1

You gave the answer yourself: if suffices to keep counts in correspondence to the unique elements. Hence a compact data structure is the list of the unique elements paired with the list of counts in the same order.
Now how this is obtained depends on how you plan to remove the duplicates and the kind of access desired. One way is to sort the initial list, purge the duplicates at the same time that you count them and fill the list of counts. Another way is to use a map with the list elements as keys and associate them with a count. Whether you keep the map or fill new lists is your choice.

The number of duplicate elements in a set in C++ can be determined by using the size() function and subtracting the number of unique elements in the set, which can be found by using the unique() function.
#include <iostream>
#include <set>
#include <algorithm>
int main()
{
std::set<int> mySet;
mySet.insert(1);
mySet.insert(2);
mySet.insert(2);
mySet.insert(3);
mySet.insert(3);
mySet.insert(3);
int numDuplicates = 0;
int lastElement = -1;
for (int element : mySet) {
if (element == lastElement) {
numDuplicates++;
}
lastElement = element;
}
std::cout << numDuplicates << std::endl;
return 0;
}

Related

c++ Algorithm to Compare various length vectors and isolate "unique", sort of

I have a complex problem and have been trying to identify what needs to be a very, very efficient algorithm. I'm hoping i can get some ideas from you helpful folks. Here is the situation.
I have a vector of vectors. These nested vectors are of various length, all storing integers in a random order, such as (pseudocode):
vector_list = {
{ 1, 4, 2, 3 },
{ 5, 9, 2, 1, 3, 3 },
{ 2, 4, 2 },
...,
100 more,
{ 8, 2, 2, 4 }
}
and so on, up to over 100 different vectors at a time inside vector_list. Note that the same integer can appear in each vector more than once. I need to remove from this vector_list any vectors that are duplicates of another vector. A vector is a duplicate of another vector if:
It has the same integers as the other vector (regardless of order). So if we have
vec1 = { 1, 2, 3 }
vec2 = { 2, 3, 1 }
These are duplicates and I need to remove one of them, it doesnt matter which one.
A vector contains all of the other integers of the other vector. So if we have
vec1 = { 3, 2, 2 }
vec2 = { 4, 2, 3, 2, 5 }
Vec2 has all of the ints of vec1 and is bigger, so i need to delete vec1 in favor of vec2
The problem is as I mentioned the list of vectors can be very big, over 100, and the algorithm may need to run as many as 1000 times on a button click, with a different group of 100+ vectors over 1000 times. Hence the need for efficiency. I have considered the following:
Sorting the vectors may make life easier, but as I said, this has to be efficient, and i'd rather not sort if i didnt have to.
It's more complicated by the fact that the vectors aren't in any order with respect to their size. For example, if the vectors in the list were ordered by size:
vector_list = {
{ },
{ },
{ },
{ },
{ },
...
{ },
{ }
}
It might make life easier, but that seems like it would take a lot of effort and I'm not sure about the gain.
The best effort I've had so far to try and solve this problem is:
// list of vectors, just 4 for illustration, but in reality more like 100, with lengths from 5 to 15 integers long
std::vector<std::vector<int>> vector_list;
vector_list.push_back({9});
vector_list.push_back({3, 4, 2, 8, 1});
vector_list.push_back({4, 2});
vector_list.push_back({1, 3, 2, 4});
std::vector<int>::iterator it;
int i;
int j;
int k;
// to test if a smaller vector is a duplicate of a larger vector, i copy the smaller vector, then
// loop through ints in the larger vector, seeing if i can find them in the copy of the smaller. if i can,
// i remove the item from the smaller copy, and if the size of the smaller copy reaches 0, then the smaller vector
// was a duplicate of the larger vector and can be removed.
std::vector<int> copy;
// flag for breaking a for loop below
bool erased_i;
// loop through vector list
for ( i = 0; i < vector_list.size(); i++ )
{
// loop again, so we can compare every vector to every other vector
for ( j = 0; j < vector_list.size(); j++ )
{
// don't want to compare a vector to itself
if ( i != j )
{
// if the vector in i loop is at least as big as the vector in j loop
if ( vector_list[i].size() >= vector_list[j].size() )
{
// copy the smaller j vector
copy = vector_list[j];
// loop through each item in the larger i vector
for ( k = 0; k < vector_list[i].size(); k++ ) {
// if the item in the larger i vector is in the smaller vector,
// remove it from the smaller vector
it = std::find(copy.begin(), copy.end(), vector_list[i][k]);
if (it != copy.end())
{
// erase
copy.erase(it);
// if the smaller vector has reached size 0, then it must have been a smaller duplicate that
// we can delete
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + j);
j--;
}
}
}
}
else
{
// otherwise vector j must be bigger than vector i, so we do the same thing
// in reverse, trying to erase vector i
copy = vector_list[i];
erased_i = false;
for ( k = 0; k < vector_list[j].size(); k++ ) {
it = std::find(copy.begin(), copy.end(), vector_list[j][k]);
if (it != copy.end()) {
copy.erase(it);
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + i);
// put an extra flag so we break out of the j loop as well as the k loop
erased_i = true;
break;
}
}
}
if ( erased_i ) {
// break the j loop because we have to start over with whatever
// vector is now in position i
break;
}
}
}
}
}
std::cout << "ENDING VECTORS\n";
// TERMINAL OUTPUT:
vector_list[0]
[9]
vector_list[1]
[3, 4, 2, 8, 1]
So this function gives me the right results, as these are the 2 unique vectors. It also gives me the correct results if i push the initial 4 vectors in reverse order, so the smallest one comes last for example. But it feels so inefficient comparing every vector to every other vector. Plus i have to create these "copies" and try to reduce them to 0 .size() with every comparison I make. very inefficient.
Anyways, any ideas on how I could make this speedier would be much appreciated. Maybe some kind of organization by vector length, I dunno.... It seems wasteful to compare them all to each other.
Thanks!
Loop through the vectors and for each vector, map the count of unique values occurring in it. unordered_map<int, int> would suffice for this, let's call it M.
Also maintain a set<unordered_map<int, int>>, say S, ordered by the size of unordered_map<int, int> in decreasing order.
Now we will have to compare contents of M with the contents of unordered_maps in S. Let's call M', the current unordered_map in S being compared with M. M will be a subset of M' only when the count of all the elements in M is less than or equal to the count of their respective elements in M'. If that's the case then it's a duplicate and we'll not insert. For any other case, we'll insert. Also notice that if the size of M is greater than the size of M', M can't be a subset of M'. That means we can insert M in S. This can be used as a pre-condition to speed things up. Maintain the indices of vectors which weren't inserted in S, these are the duplicates and have to be deleted from vector_list in the end.
Time Complexity: O(N*M) + O(N^2*D) + O(N*log(N)) = O(N^2*D) where N is the number of vectors in vector_list, M is the average size of the vectors in vector_list and D is the average size of unordered_map's in S. This is for the worst case when there aren't any duplicates. For average case, when there are duplicates, the second complexity will come down.
Edit: The above procedure will create a problem. To fix that, we'll need to make unordered_maps of all vectors, store them in a vector V, and sort that vector in decreasing order of the size of unordered_map. Then, we'll start from the biggest in this vector and apply the above procedure on it. This is necessary because, a subset, say M1 of a set M2, can be inserted into S before M2 if the respective vector of M1 comes before the respective vector of M2 in vector_list. So now we don't really need S, we can compare them within V itself. Complexity won't change.
Edit 2: The same problem will occur again if sizes of two unordered_maps are the same in V when sorting V. To fix that, we'll need to keep the contents of unordered_maps in some order too. So just replace unordered_map with map and in the comparator function, if the size of two maps is the same, compare element by element and whenever the keys are not the same for the very first time or are same but the M[key] is not the same, put the bigger element before the other in V.
Edit 3: New Time Complexity: O(N*M*log(D)) + O(N*D*log(N)) + O(N^2*D*log(D)) = O(N^2*D*log(D)). Also you might want to pair the maps with the index of the respective vectors in vector_list so as to know which vector you must delete from vector_list when you find a duplicate in V.
IMPORTANT: In sorted V, we must start checking from the end just to be safe (in case we choose to delete a duplicate from vector_list as well as V whenever we encounter it). So for the last map in V compare it with the rest of the maps before it to check if it is a duplicate.
Example:
vector_list = {
{1, 2, 3},
{2, 3, 1},
{3, 2, 2},
{4, 2, 3, 2, 5},
{1, 2, 3, 4, 6, 2},
{2, 3, 4, 5, 6},
{1, 5}
}
Creating maps of respective vectors:
V = {
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{2->2, 3->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{1->1, 5->1}
}
After sorting:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{1->1, 5->1},
{2->2, 3->1}
}
After deleting duplicates:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 5->1}
}
Edit 4: I tried coding it up. Running it a 1000 times on a list of 100 vectors, the size of each vector being in range [1-250], the range of the elements of vector being [0-50] and assuming the input is available for all the 1000 times, it takes around 2 minutes on my machine. It goes without saying that there is room for improvement in my code (and my machine).
My approach is to copy the vectors that pass the test to an empty vector.
May be inefficient.
May have bugs.
HTH :)
C++ Fiddle
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
int main(int, char **) {
using namespace std;
using vector_of_integers = vector<int>;
using vector_of_vectors = vector<vector_of_integers>;
vector_of_vectors in = {
{ 1, 4, 2, 3 }, // unique
{ 5, 9, 2, 1, 3, 3 }, // unique
{ 3, 2, 1 }, // exists
{ 2, 4, 2 }, // exists
{ 8, 2, 2, 4 }, // unique
{ 1, 1, 1 }, // exists
{ 1, 2, 2 }, // exists
{ 5, 8, 2 }, // unique
};
vector_of_vectors out;
// doesnt_contain_vector returns true when there is no entry in out that is superset of any of the passed vectors
auto doesnt_contain_vector = [&out](const vector_of_integers &in_vector) {
// is_subset returns true a vector contains all the integers of the passed vector
auto is_subset = [&in_vector](const vector_of_integers &out_vector) {
// contained returns true when the vector contains the passed integer
auto contained = [&out_vector](int i) {
return find(out_vector.cbegin(), out_vector.cend(), i) != out_vector.cend();
};
return all_of(in_vector.cbegin(), in_vector.cend(), contained);
};
return find_if(out.cbegin(), out.cend(), is_subset) == out.cend();
};
copy_if(in.cbegin(), in.cend(), back_insert_iterator<vector_of_vectors>(out), doesnt_contain_vector);
// show results
for (auto &vi: out) {
copy(vi.cbegin(), vi.cend(), std::ostream_iterator<int>(std::cout, ", "));
cout << "\n";
}
}
You could try something like this. I use std::sort and std::includes. Perhaps this is not the most effective solution.
// sort all nested vectors
std::for_each(vlist.begin(), vlist.end(), [](auto& v)
{
std::sort(v.begin(), v.end());
});
// sort vector of vectors by length of items
std::sort(vlist.begin(), vlist.end(), [](const vector<int>& a, const vector<int>& b)
{
return a.size() < b.size();
});
// exclude all duplicates
auto i = std::begin(vlist);
while (i != std::end(vlist)) {
if (any_of(i+1, std::end(vlist), [&](const vector<int>& a){
return std::includes(std::begin(a), std::end(a), std::begin(*i), std::end(*i));
}))
i = vlist.erase(i);
else
++i;
}

Update element stl list of lists

I am working with a list of a list of vector of ints (std::list<std::list<std::vector<int>>> z(nlevel)).
I might have something like:
{ {1} {2} {3} }
{ {1 2} {2 1} {1 3} }
{ {1 2 3} {2 1 3} {1 2 4} }
I need to remove the non-unique combination of integers, so e.g., the second element of the list above should become
{ { 1 2 } {1 3} }
This is a large object, so I'm trying to update each element of the outermost list by reference. I've tried something like:
lit = z.begin();
for (i = 0; i < nlevel; i++) {
distinct_z(*lit, primes);
lit++;
}
where distinct_z is a function to find the unique vector combinations by reference, but this doesn't seem to affect the list z. Note: distinct_z does work fine in another part of my code where I am already working with the ith element of the list. I've provided distinct_z below. It includes some unique data types from the Rcpp package in R, but is hopefully understandable. Essentially, I use the log sum of prime numbers to identify non-unique combinations of integers because the order of the integers does not matter. To reiterate, distinct_z does work in another part of my code where I pass it an actual list of vectors of ints. The problem seems to be that I'm trying to pass something using an iterator.
void distinct_lz(std::list<std::vector<int>> &lz,
const IntegerVector &primes) {
int i, j, npids = lz.size();
NumericVector pids(npids);
std::list<std::vector<int>>::iterator lit = lz.begin();
int z_size = lit -> size();
for(i = 0; i < npids; i++) {
for (j = 0; j < z_size; j++) {
// cprime = primes[lit -> at(j)];
// pids[i] += log(cprime);
// cprime = primes[lit -> at(j)];
pids[i] += log(primes[lit -> at(j)]);
}
lit++;
}
LogicalVector dup = duplicated(round(pids, 8));
lit = lz.begin();
for(i = 0; i < npids; i++) {
if(dup(i) == 1) {
lz.erase(lit);
}
lit++;
}
}
What is the best approach for doing what I want?
Background: The data structure probably seems unnecessarily complicated, but I'm enumerating all connected subgraphs starting at a vertex using a breadth-first approach. So given a current subgraph, I see what other vertices are connected to create a set of new subgraphs and repeat. I initially did this using a list of vectors of ints, but removing repeats was ridiculously slow due to the fact that I had to copy the current object if I removed part of the vector. This approach is much faster even though the structure is more complicated.
Edit: Here is a solution that mostly does what I want, though it results in some undesired copying. I updated distinct_z to return a copy of the object instead of modifying the reference, and then replaced the element at lit.
lit = z.begin();
for (i = 0; i < nlevel; i++) {
(*lit) = distinct_z(*lit, primes);
lit++;
}
In C++ there is a well known idiom known as the erase-remove idiom for removing elements from an STL container. It basically involves shuffling unwanted items to the end of the container and then erasing the unwanted tail.
We can use a predicate function (e.g. lambda) to select the items we want to erase and use functions from <algorithm>. In your case we use a set of set of ints (std::<set<int>>) to store unique combinations. Convert each vector in the list to a set of ints and delete it if hasn't been seen before.
#include <set>
#include <list>
#include <vector>
#include <algorithm>
#include <iostream>
void distinct_lz(std::list<std::vector<int>>& lz)
{
std::set<std::set<int>> unqiueNums;
lz.erase(std::remove_if(lz.begin(), lz.end(),
[&unqiueNums](std::vector<int>& v) {
std::set<int> s{ v.cbegin(), v.cend() };
if (unqiueNums.find(s) != unqiueNums.end())
return true;
else
{
unqiueNums.insert(s);
return false;
}
}), lz.end());
}
int main()
{
std::list<std::vector<int>> lv = { {1, 2}, {2, 1}, {1, 3}, {3,4} };
distinct_lz(lv);
for (auto &v: lv)
{
for( auto n: v)
{
std::cout << n << " ";
}
std::cout << "\n";
}
}
Output:
1 2
1 3
3 4
Working version here.

How to find the vector element that indexes the lowest value in an array

To simplify my issue, let's say I have an array that stores some values:
int Costs[5] = {40, 50, 10, 10, 30};
and I have a vector which I use to store IDs
std::vector<int> id = { 4,0,1 };
so that, for example, Costs [ id [ 0 ] ] will return the value 30 and so on
I need the INDEX number whose value would point to the lowest value in the Costs array.
In my example the index that I need would be 0 since Costs[id[0]] is lower than Costs[id[1]] or Costs[id[2]]
So if I were to make a function, I would NOT want it to return what id[0] holds; I would want it to return the 0, which is the index / element number.
I would be grateful if anyone could help me code this.
This is taken straight from http://en.cppreference.com/w/cpp/algorithm/min_element
If i understand correctly you want to get the index of the minimal element in an array
This should work both with vectors and arrays alike
std::vector<int> v{3, 1, 4, 1, 5, 9};
// We need to get the min value
std::vector<int>::iterator result = std::min_element(std::begin(v), std::end(v));
// Then we get the index of the value in the array
std::cout << "min element at: " << std::distance(std::begin(v), result);
What you can do is first get all the values of Cost array in the position of id values and store them in an temporary std::vector.
According to your code, you can use a iterator over the values contained in ids, inside the loop you use std::vector push_back() function to add the elements in position of Cost[id].
Then apply the above mentioned std::min_element and std::distance to get the index. Note that this will return the index of the id vector, getting the value from there is just a matter of ids[index]
Here's full working test code. Thank you Jointts for explaining how to do it.
int Costs[5] = { 40, 50, 10, 0, 90 };
std::vector<int> id = { 4,1,0 };
std::vector<int> temp;
for (int i = 0; i < id.size(); i++)
{
temp.push_back(Costs[id[i]]);
}
std::vector<int>::iterator lowest = std::min_element(std::begin(temp), std::end(temp));
std::cout << "min element at: " << std::distance(std::begin(temp), lowest) << std::endl;

How to count unique integers in unordered_set?

A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set. To be clear, I start with some array, turn into an unordered set, and suddenly, the unordered_set only contains unique integers, and I am simply after the repeat number of the integers in the unordered_set.
Is this possible at all? (something like unordered_set.count(index) ?)
A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set.
If the container is contiguous, like an array, then I believe you can use ptrdiff_t to count them after doing some iterator math. I'm not sure about non-contiguous containers, though.
Since you start with an array:
Call unique on the array
unique returns iter.end()
Calculate ptrdiff_t count using iter.begin() and iter.end()
Remember that the calculation in step 3 needs to be adjusted for the sizeof and element.
But to paraphrase Beta, some containers lend themselves to this, and other do not. If you have an unordered set (or a map or a tree), then the information will not be readily available.
According to your answer to the user2357112's question I will write a solution.
So, let's assume that instead of unordered_set we will use a vector and our vector has values like this:
{1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
So, we want to get numbers (in different vector I think) of how many times particular value appears in the vector, right? And in this specific case result would be: 1 appears 5 times, 3 appears one time, 4 appears 3 times and 5 appears 2 times.
To get this done, one possible solution can be like this:
Get unique entries from source vector and store them in different vector, so this vector will contain: 1, 3, 4, 5
Iterate through whole unique vector and count these elements from source vector.
Print result
The code from point 1, can be like this:
template <typename Type>
vector<Type> unique_entries (vector<Type> vec) {
for (auto iter = vec.begin (); iter != vec.end (); ++iter) {
auto f = find_if (iter+1, vec.end (), [&] (const Type& val) {
return *iter == val;
});
if (f != vec.end ()) {
vec.erase (remove (iter+1, vec.end (), *iter), vec.end ());
}
}
return vec;
}
The code from point 2, can be like this:
template <typename Type>
struct Properties {
Type key;
long int count;
};
template <typename Type>
vector<Properties<Type>> get_properties (const vector<Type>& vec) {
vector<Properties<Type>> ret {};
auto unique_vec = unique_entries (vec);
for (const auto& uv : unique_vec) {
auto c = count (vec.begin (), vec.end (), uv); // (X)
ret.push_back ({uv, c});
}
return ret;
}
Of course we do not need Properties class to store key and count value, you can return just a vector of int (with count of elements), but as I said, it is one of the possible solutions. So, by using unique_entries we get a vector with unique entries ( :) ), then we can iterate through the whole vector vec (get_properties, using std::count marked as (X)), and push_back Properties object to the vector ret.
The code from point 3, can be like this:
template <typename Type>
void show (const vector<Properties<Type>>& vec) {
for (const auto& v : vec) {
cout << v.key << " " << v.count << endl;
}
}
// usage below
vector<int> vec {1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
auto properties = get_properties (vec);
show (properties);
And the result looks like this:
1 5
3 1
4 3
5 2
What is worth to note, this example has been written using templates to provide flexibility of choosing type of elements in the vector. If you want to store values of long, long long, short, etc, instead of int type, all you have to do is to change definition of source vector, for example:
vector<unsigned long long> vec2 {1, 3, 2, 3, 4, 4, 4, 4, 3, 3, 2, 3, 1, 7, 2, 2, 2, 1, 6, 5};
show (get_properties (vec2));
will produce:
1 3
3 5
2 5
4 4
7 1
6 1
5 1
which is desired result.
One more note, you can do this with vector of string as well.
vector<string> vec_str {"Thomas", "Rick", "Martin", "Martin", "Carol", "Thomas", "Martin", "Josh", "Jacob", "Jacob", "Rick"};
show (get_properties (vec_str));
And result is:
Thomas 2
Rick 2
Martin 3
Carol 1
Josh 1
Jacob 2
I assume you're trying to get a list of unique values AND the number of their occurences. If that's the case, then std::map provides the cleanest and simplest solution:
//Always prefer std::vector (or at least std::array) over raw arrays if you can
std::vector<int> myInts {2,2,7,8,3,7,2,3,46,7,2,1};
std::map<int, unsigned> uniqueValues;
//Get unique values and their count
for (int val : myInts)
++uniqueValues[val];
//Output:
for (const auto & val : uniqueValues)
std::cout << val.first << " occurs " << val.second << " times." << std::endl;
In this case it doesn't have to be std::unordered_set.

Algorithm for finding the number which appears the most in a row - C++

I need a help in making an algorithm for solving one problem: There is a row with numbers which appear different times in the row, and i need to find the number that appears the most and how many times it's in the row, ex:
1-1-5-1-3-7-2-1-8-9-1-2
That would be 1 and it appears 5 times.
The algorithm should be fast (that's my problem).
Any ideas ?
What you're looking for is called the mode. You can sort the array, then look for the longest repeating sequence.
You could keep hash table and store a count of every element in that structure, like this
h[1] = 5
h[5] = 1
...
You can't get it any faster than in linear time, as you need to at least look at each number once.
If you know that the numbers are in a certain range, you can use an additional array to sum up the occurrences of each number, otherwise you'd need a hashtable, which is slightly slower.
Both of these need additional space though and you need to loop through the counts again in the end to get the result.
Unless you really have a huge amount of numbers and absolutely require O(n) runtime, you could simply sort your array of numbers. Then you can walk once through the numbers and simply keep the count of the current number and the number with the maximum of occurences in two variables. So you save yourself a lot of space, tradeing it off with a little bit of time.
There is an algorithm that solves your problem in linear time (linear in the number of items in the input). The idea is to use a hash table to associate to each value in the input a count indicating the number of times that value has been seen. You will have to profile against your expected input and see if this meets your needs.
Please note that this uses O(n) extra space. If this is not acceptable, you might want to consider sorting the input as others have proposed. That solution will be O(n log n) in time and O(1) in space.
Here's an implementation in C++ using std::tr1::unordered_map:
#include <iostream>
#include <unordered_map>
using namespace std;
using namespace std::tr1;
typedef std::tr1::unordered_map<int, int> map;
int main() {
map m;
int a[12] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
for(int i = 0; i < 12; i++) {
int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
m.insert(map::value_type(key, 1));
}
else {
it->second++;
}
}
int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
if(it->second > count) {
count = it->second;
value = it->first;
}
}
cout << "Value: " << value << endl;
cout << "Count: " << count << endl;
}
The algorithm works using the input integers as keys in a hashtable to a count of the number of times each integer appears. Thus the key (pun intended) to the algorithm is building this hash table:
int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
m.insert(map::value_type(key, 1));
}
else {
it->second++;
}
So here we are looking at the ith element in our input list. Then what we do is we look to see if we've already seen it. If we haven't, we add a new value to our hash table containing this new integer, and an initial count of one indicating this is our first time seeing it. Otherwise, we increment the counter associated to this value.
Once we have built this table, it's simply a matter of running through the values to find one that appears the most:
int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
if(it->second > count) {
count = it->second;
value = it->first;
}
}
Currently there is no logic to handle the case of two distinct values appearing the same number of times and that number of times being the largest amongst all the values. You can handle that yourself depending on your needs.
Here is a simple one, that is O(n log n):
Sort the vector # O(n log n)
Create vars: int MOST, VAL, CURRENT
for ELEMENT in LIST:
CURRENT += 1
if CURRENT >= MOST:
MOST = CURRENT
VAL = ELEMENT
return (VAL, MOST)
There are few methods:
Universal method is "sort it and find longest subsequence" which is O(nlog n). The fastest sort algorithm is quicksort ( average, the worst is O( n^2 ) ). Also you can use heapsort but it is quite slower in average case but asymptotic complexity is O( n log n ) also in the worst case.
If you have some information about numbers then you can use some tricks. If numbers are from the limited range then you can use part of algorithm for counting sort. It is O( n ).
If this isn't your case, there are some other sort algorithms which can do it in linear time but no one is universal.
The best time complexity you can get here is O(n). You have to look through all elements, because the last element may be the one which determines the mode.
The solution depends on whether time or space is more important.
If space is more important, then you can sort the list then find the longest sequence of consecutive elements.
If time is more important, you can iterate through the list, keeping a count of the number of occurences of each element (e.g. hashing element -> count). While doing this, keep track of the element with max count, switching if necessary.
If you also happen know that the mode is the majority element (i.e. there are more than n/2 elements in the array with this value), then you can get O(n) speed and O(1) space efficiency.
Generic C++ solution:
#include <algorithm>
#include <iterator>
#include <map>
#include <utility>
template<class T, class U>
struct less_second
{
bool operator()(const std::pair<T, U>& x, const std::pair<T, U>& y)
{
return x.second < y.second;
}
};
template<class Iterator>
std::pair<typename std::iterator_traits<Iterator>::value_type, int>
most_frequent(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type vt;
std::map<vt, int> frequency;
for (; begin != end; ++begin) ++frequency[*begin];
return *std::max_element(frequency.begin(), frequency.end(),
less_second<vt, int>());
}
#include <iostream>
int main()
{
int array[] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
std::pair<int, int> result = most_frequent(array, array + 12);
std::cout << result.first << " appears " << result.second << " times.\n";
}
Haskell solution:
import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)
count = foldl step Map.empty where
step frequency x = Map.alter next x frequency
next Nothing = Just 1
next (Just n) = Just (n+1)
most_frequent = maximumBy (compare `on` snd) . Map.toList . count
example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]
Shorter Haskell solution, with help from stack overflow:
import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)
most_frequent = maximumBy (compare `on` snd) . Map.toList .
Map.fromListWith (+) . flip zip (repeat 1)
example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]
The solution below gives you the count of each number. It is a better approach than using map in terms of time and space. If you need to get the number that appeared most number of times, then this is not better than previous ones.
EDIT: This approach is useful for unsigned numbers only and the numbers starting from 1.
std::string row = "1,1,5,1,3,7,2,1,8,9,1,2";
const unsigned size = row.size();
int* arr = new int[size];
memset(arr, 0, size*sizeof(int));
for (int i = 0; i < size; i++)
{
if (row[i] != ',')
{
int val = row[i] - '0';
arr[val - 1]++;
}
}
for (int i = 0; i < size; i++)
std::cout << i + 1 << "-->" << arr[i] << std::endl;
Since this is homework I think it's OK to supply a solution in a different language.
In Smalltalk something like the following would be a good starting point:
SequenceableCollection>>mode
| aBag maxCount mode |
aBag := Bag new
addAll: self;
yourself.
aBag valuesAndCountsDo: [ :val :count |
(maxCount isNil or: [ count > maxCount ])
ifTrue: [ mode := val.
maxCount := count ]].
^mode
As time is going by, the language evolves.
We have now many more language constructs that make life simpler
namespace aliases
CTAD (Class Template Argument Deduction)
more modern containers like std::unordered_map
range based for loops
the std::ranges library
projections
using statment
structured bindings
more modern algorithms
We could now come up with the following code:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
namespace rng = std::ranges;
int main() {
// Demo data
std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
// Count values
using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;
Counter counter{}; for (const auto& d : data) counter[d]++;
// Get max
const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);
// Show output
std::cout << '\n' << value << " found " << count << " times\n";
}