Finding Frequency of numbers in a given group of numbers

Finding Frequency of numbers in a given group of numbers - c++

Suppose we have a vector/array in C++ and we wish to count which of these N elements has maximum repetitive occurrences and output the highest count. Which algorithm is best suited for this job.
example:
int a = { 2, 456, 34, 3456, 2, 435, 2, 456, 2}
the output is 4 because 2 occurs 4 times. That is the maximum number of times 2 occurs.

Sort the array and then do a quick pass to count each number. The algorithm has O(N*logN) complexity.
Alternatively, create a hash table, using the number as the key. Store in the hashtable a counter for each element you've keyed. You'll be able to count all elements in one pass; however, the complexity of the algorithm now depends on the complexity of your hasing function.

Optimized for space:
Quicksort (for example) then iterate over the items, keeping track of largest count only.
At best O(N log N).
Optimized for speed:
Iterate over all elements, keeping track of the separate counts.
This algorithm will always be O(n).

If you have the RAM and your values are not too large, use counting sort.

A possible C++ implementation that makes use of STL could be:
#include <iostream>
#include <algorithm>
#include <map>
// functor
struct maxoccur
{
int _M_val;
int _M_rep;
maxoccur()
: _M_val(0),
_M_rep(0)
{}
void operator()(const std::pair<int,int> &e)
{
std::cout << "pair: " << e.first << " " << e.second << std::endl;
if ( _M_rep < e.second ) {
_M_val = e.first;
_M_rep = e.second;
}
}
};
int
main(int argc, char *argv[])
{
int a[] = {2,456,34,3456,2,435,2,456,2};
std::map<int,int> m;
// load the map
for(unsigned int i=0; i< sizeof(a)/sizeof(a[0]); i++)
m [a[i]]++;
// find the max occurence...
maxoccur ret = std::for_each(m.begin(), m.end(), maxoccur());
std::cout << "value:" << ret._M_val << " max repetition:" << ret._M_rep << std::endl;
return 0;
}

a bit of pseudo-code:
//split string into array firts
strsplit(numbers) //PHP function name to split a string into it's components
i=0
while( i < count(array))
{
if(isset(list[array[i]]))
{
list[array[i]]['count'] = list + 1
}
else
{
list[i]['count'] = 1
list[i]['number']
}
i=i+1
}
usort(list) //usort is a php function that sorts an array by its value not its key, Im assuming that you have something in c++ that does this
print list[0]['number'] //Should contain the most used number

The hash algorithm (build count[i] = #occurrences(i) in basically linear time) is very practical, but is theoretically not strictly O(n) because there could be hash collisions during the process.
An interesting special case of this question is the majority algorithm, where you want to find an element which is present in at least n/2 of the array entries, if any such element exists.
Here is a quick explanation, and a more detailed explanation of how to do this in linear time, without any sort of hash trickiness.

If the range of elements is large compared with the number of elements, I would, as others have said, just sort and scan. This is time n*log n and no additional space (maybe log n additional).
THe problem with the counting sort is that, if the range of values is large, it can take more time to initialize the count array than to sort.

Here's my complete, tested, version, using a std::tr1::unordered_map.
I make this approximately O(n). Firstly it iterates through the n input values to insert/update the counts in the unordered_map, then it does a partial_sort_copy which is O(n). 2*O(n) ~= O(n).
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <iostream>
namespace {
// Only used in most_frequent but can't be a local class because of the member template
struct second_greater {
// Need to compare two (slightly) different types of pairs
template <typename PairA, typename PairB>
bool operator() (const PairA& a, const PairB& b) const
{ return a.second > b.second; }
};
}
template <typename Iter>
std::pair<typename std::iterator_traits<Iter>::value_type, unsigned int>
most_frequent(Iter begin, Iter end)
{
typedef typename std::iterator_traits<Iter>::value_type value_type;
typedef std::pair<value_type, unsigned int> result_type;
std::tr1::unordered_map<value_type, unsigned int> counts;
for(; begin != end; ++begin)
// This is safe because new entries in the map are defined to be initialized to 0 for
// built-in numeric types - no need to initialize them first
++ counts[*begin];
// Only need the top one at this point (could easily expand to top-n)
std::vector<result_type> top(1);
std::partial_sort_copy(counts.begin(), counts.end(),
top.begin(), top.end(), second_greater());
return top.front();
}
int main(int argc, char* argv[])
{
int a[] = { 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
std::pair<int, unsigned int> m = most_frequent(a, a + (sizeof(a) / sizeof(a[0])));
std::cout << "most common = " << m.first << " (" << m.second << " instances)" << std::endl;
assert(m.first == 2);
assert(m.second == 4);
return 0;
}

It wil be in O(n)............ but the thing is the large no. of array can take another array with same size............
for(i=0;i
mar=count[o];
index=o;
for(i=0;i
then the output will be......... the element index is occured for max no. of times in this array........
here a[] is the data array where we need to search the max occurance of certain no. in an array.......
count[] having the count of each element..........
Note : we alrdy knw the range of datas will be in array..
say for eg. the datas in that array ranges from 1 to 100....... then have the count array of 100 elements to keep track, if its occured increament the indexed value by one........

Now, in the year 2022 we have
namespace aliases
more modern containers like std::unordered_map
CTAD (Class Template Argument Deduction)
range based for loops
using statment
the std::ranges library
more modern algorithms
projections
structured bindings
With that we can now write:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
namespace rng = std::ranges;
int main() {
// Demo data
std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
// Count values
using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;
Counter counter{}; for (const auto& d : data) counter[d]++;
// Get max
const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);
// Show output
std::cout << '\n' << value << " found " << count << " times\n";
}

Related

Finding Duplicates in an array using a Set in C++

I am currently practicing for coding interviews and am working on a function that takes in an array and the size of that array and prints out which numbers in it are duplicates. I have gotten this to work using the two for loop method but want an optimized solution using sets. Snippet of the code I have is below,
#include <iostream>
#include <set>
using namespace std;
void FindDuplicate(int integers[], int n){
set<int>setInt;
for(int i = 0; i < n; i++){
//if this num is not in the set then it is not a duplicate
if(setInt.find(integers[i]) != setInt.end()){
setInt.insert({integers[i]});
}
else
cout << integers[i] << " is a duplicate";
}
}
int main() {
int integers [] = {1,2,2,3,3};
int n = sizeof(integers)/sizeof(integers[0]);
FindDuplicate(integers, n);
}
Any helpful advice is appreciated, thanks

I think your comparison is not needed, insert do it for you:
https://en.cppreference.com/w/cpp/container/set/insert
Returns a pair consisting of an iterator to the inserted element (or
to the element that prevented the insertion) and a bool value set to
true if the insertion took place.
Just insert element and check what insert function returns (false on second element of pair in case of duplication) :)

my solution proposal is :
count the frequencies of each element (algo for frequencies are explained here frequency
display elements with frequency more than 1 (it is a duplicate)
In each operation, you do not use imbricated loops.
#include <iostream>
#include <unordered_map>
using namespace std;
void FindDuplicate(int integers[], int n)
{
unordered_map<int, int> mp;
// Traverse through array elements and
// count frequencies
for (int i = 0; i < n; i++)
{
mp[integers[i]]++;
}
cout << "The repeating elements are : " << endl;
for (int i = 0; i < n; i++) {
if (mp[integers[i]] > 1)
{
cout << integers[i] << endl;
mp[integers[i]] = -1;
}
}
}
int main()
{
int integers [] = {1,1,0,0,2,2,3,3,3,6,7,7,8};
int n = sizeof(integers)/sizeof(integers[0]);
FindDuplicate(integers, n);
}

This is my feedback:
#include <iostream>
#include <vector>
#include <set>
// dont' do this, in big projects it's not done (nameclash issues)
// using namespace std;
// pass vector by const reference you're not supposed to change the input
// the reference will prevent data from being copied.
// naming is important, do you want to find one duplicate or more...
// renamed to FindDuplicates because you want them all
void FindDuplicates(const std::vector<int>& input)
{
std::set<int> my_set;
// don't use index based for loops if you don't have to
// range based for loops are more safe
// const auto is more refactorable then const int
for (const auto value : input)
{
//if (!my_set.contains(value)) C++ 20 syntax
if (my_set.find(value) == my_set.end())
{
my_set.insert(value);
}
else
{
std::cout << "Array has a duplicate value : " << value << "\n";
}
}
}
int main()
{
// int integers[] = { 1,2,2,3,3 }; avoid "C" style arrays they're a **** to pass around safely
// int n = sizeof(integers) / sizeof(integers[0]); std::vector (or std::array) have size() methods
std::vector input{ 1,2,2,3,3 };
FindDuplicates(input);
}

You do not need to use a set.
To find the duplicates:
Sort array with numbers
Iterate over the array (start with second element) and copy elements where previous element equals
current element into a new vector "duplicates"
(Optional) use unique on the "duplicates" if you like to know which number is a duplicate and do not care if it is 2, 3 or 4 times in the numbers array
Example Implementation:
#include <algorithm>
#include <iostream>
#include <vector>
void
printVector (std::vector<int> const &numbers)
{
for (auto const &number : numbers)
{
std::cout << number << ' ';
}
std::cout << std::endl;
}
int
main ()
{
auto numbers = std::vector<int>{ 1, 2, 2, 42, 42, 42, 3, 3, 42, 42, 1, 2, 3, 4, 5, 6, 7, 7 };
std::sort (numbers.begin (), numbers.end ());
auto duplicates = std::vector<int>{};
std::for_each (numbers.begin () + 1, numbers.end (), [prevElement = numbers.begin (), &duplicates] (int currentElement) mutable {
if (currentElement == *prevElement)
{
duplicates.push_back (currentElement);
}
prevElement++;
});
duplicates.erase (std::unique (duplicates.begin (), duplicates.end ()), duplicates.end ());
printVector (duplicates);
}
edit:
If you have no problem with using more memory and more calculations but like it more expressive:
Sort numbers
Create a new array with unique numbers "uniqueNumbers"
Use "set_difference" to calculate (numbers-uniqueNumbers) which leads to an new array with all the duplicates
(Optional) use unique on the "duplicates" if you like to know which number is a duplicate and do not care if it is 2, 3 or 4 times in the numbers array
Example Implementation:
#include <algorithm>
#include <iostream>
#include <vector>
void
printVector (std::vector<int> const &numbers)
{
for (auto const &number : numbers)
{
std::cout << number << ' ';
}
std::cout << std::endl;
}
int
main ()
{
auto numbers = std::vector<int>{ 2, 2, 42, 42, 42, 3, 3, 42, 42, 1, 2, 3, 4, 5, 6, 7, 7 };
std::sort (numbers.begin (), numbers.end ());
auto uniqueNumbers = std::vector<int>{};
std::unique_copy (numbers.begin (), numbers.end (), std::back_inserter (uniqueNumbers));
auto duplicates = std::vector<int>{};
std::set_difference (numbers.begin (), numbers.end (), uniqueNumbers.begin (), uniqueNumbers.end (), std::back_inserter (duplicates));
std::cout << "duplicate elements: ";
printVector (duplicates);
std::cout << "unique duplicate elements: ";
printVector ({ duplicates.begin (), std::unique (duplicates.begin (), duplicates.end ()) });
}

here's a quick solution use an array of size N (try a big number)
and whenever a number is added into the other array on the large array add 1 to the position like:
array_of_repeated[user_input]++;
so if the program asks how many times (for example) number 234 was repeated?
std::cout<<array_of_repeated[requested_number]<<std::endl;
so in this way you wont spend time looking for a number inside the other list

Default value for the second element of the map STL?

what is the default value for second element in map STL if i am initializing it with an array?
for example:
#include <bits/stdc++.h>
using namespace std;
void countFreq(int arr[], int n)
{
unordered_map<int, int> mp;
// Traverse through array elements and
// count frequencies
for (int i = 0; i < n; i++)
mp[arr[i]]++;
// Traverse through map and print frequencies
for (auto x : mp)
cout << x.first << " " << x.second << endl;
}
int main()
{
int arr[] = { 10, 20, 20, 10, 10, 20, 5, 20 };
int n = sizeof(arr) / sizeof(arr[0]);
countFreq(arr, n);
return 0;
}
How can this program return the frequency of the element in the array by accessing the second element of map mp?

what is the default value for the second element in map STL if I am initializing it with an array?
When accessing a key-value pair (kvp) in a std::map with operator[], either the key already exists, or a new kvp is constructed and the mapped_type is value-initialised. A value-initialized int is always 0. This imposes a requirement that it must be default constructible. Note that you can also access entries in a map using the at member function, which throws if the key is not found.
How can this program return the frequency of the element in the array by accessing the second element of map mp?
You have done this correctly in your code snippet. You could have used a std::multiset or std::unordered_multiset, they provide a count member function, that is the frequency of the key.
#include <set>
#include <iostream>
int main()
{
int arr[] = { 10, 20, 20, 10, 10, 20, 5, 20 };
std::multiset<int> freq (std::begin(arr), std::end(arr));
for(auto elem = freq.begin();
elem != freq.end();
elem=freq.upper_bound(*elem)) // Traverse the unique elements
{
std::cout << *elem << " count: " << freq.count(*elem) << "\n";
}
}
Godbolt
Note that your question mentions std::map but the example you provided references std::unordered_map, much of this applies to both data-structures.

Second element of map is, by default, initialized to 0(if its type is int as is in code) after trying to access its key at least once.So, when you access for the first time some element x, mp[x] becomes 0 and then in your code increases by 1 when counting.

Keep track of highest 5 numbers during file input

So lets say i have a struct
struct largestOwners
{
string name;
double amountOwned;
};
And i am reading it in from a file using ifstream with 300 names and amounts.
How can i go about keeping track of the highest 5 numbers during input? So i dont have to sort after, but rather track it during ifstream input.
My goal is to keep track of the 5 highest amounts during input so i can easily print it out later. And save time/processing rather than to do it in the future
I get i can store this in an array or another struct, but whats a good algorithm for tracking this during ifstream input to the struct?
Lets say the text file looks like this, when im reading it in.
4025025 Tony
66636 John
25 Tom
23693296 Brady
363 Bradley
6200 Tim
Thanks!

To keep track of the highest 5 numbers in a stream of incoming numbers, you could use a min-heap of size 5 (C++ STL set can be used as a min-heap).
First fill the min-heap with the first 5 numbers. After that, for each incoming element, compare that with the smallest of the largest 5 numbers that you have (root of the min-heap). If the current number is smaller than that, do nothing, otherwise remove the 5th largest (pop from min-heap) and insert the current number to the min-heap.
Deleting and inserting in the min-heap will take O(log n) time.
For example, consider the following stream of numbers:
1 2 5 6 3 4 0 10 3
The min-heap will have 1 2 3 5 6 initially.
On encountering 4, 1 gets removed and 4 gets inserted.
Min heap now looks like this: 2 3 4 5 6
On encountering 0, nothing happens.
On encountering 10, 2 gets removed and 10 gets inserted.
Min heap now looks like this: 3 4 5 6 10
On encountering 3, nothing happens.
So your final set of 5 largest elements are contained in the heap (3 4 5 6 10)
You can even tweak this to keep track of the k highest elements in an incoming stream of numbers. Just change the size of the min-heap to k.

While reading the file, keep a sorted list of the 5 largest numbers seen (and their owners).
Whenever you read a value higher than the lowest of the 5, remove the lowest and insert the new number in your sorted list.
List list can be stored in an array or in any other data structure that has an order and where you can implement a sort and insert. (Or where this is already implemented)
Instead of sorting the list you can also simply go through the 5 entries every time you read a new one (should not be too bad, because 5 entries is a very small number)

You can use the standard library function std::nth_element() for this.
It should be fairly easy to implement a comparison function (or overload the comparison operator) for your struct. Then you'd just parse the file into a vector of those and be done with it. The algorithm uses a partial sort, linear time on average.
Here's the example given on the documentation site I've linked below:
#include <iostream>
#include <vector>
#include <algorithm>
#include <functional>
int main()
{
std::vector<int> v{5, 6, 4, 3, 2, 6, 7, 9, 3};
std::nth_element(v.begin(), v.begin() + v.size()/2, v.end());
std::cout << "The median is " << v[v.size()/2] << '\n';
std::nth_element(v.begin(), v.begin()+1, v.end(), std::greater<int>());
std::cout << "The second largest element is " << v[1] << '\n';
}
For reference:
http://en.cppreference.com/w/cpp/algorithm/nth_element
Out of curiosity, I have implemented some approaches:
#include <algorithm>
#include <functional>
#include <queue>
#include <set>
#include <vector>
std::vector<int> filter_nth_element(std::vector<int> v, int n) {
auto target = v.begin()+n;
std::nth_element(v.begin(), target, v.end(), std::greater<int>());
std::vector<int> result(v.begin(), target);
return result;
}
std::vector<int> filter_pqueue(std::vector<int> v, int n) {
std::vector<int> result;
std::priority_queue<int, std::vector<int>, std::greater<int>> q;
for (auto i: v) {
q.push(i);
if (q.size() > n) {
q.pop();
}
}
while (!q.empty()) {
result.push_back(q.top());
q.pop();
}
return result;
}
std::vector<int> filter_set(std::vector<int> v, int n) {
std::set<int> s;
for (auto i: v) {
s.insert(i);
if (s.size() > n) {
s.erase(s.begin());
}
}
return std::vector<int>(s.begin(), s.end());
}
std::vector<int> filter_deque(std::vector<int> v, int n) {
std::deque<int> q;
for (auto i: v) {
q.push_back(i);
if (q.size() > n) {
q.erase(std::min_element(q.begin(), q.end()));
}
}
return std::vector<int>(q.begin(), q.end());
}
std::vector<int> filter_vector(std::vector<int> v, int n) {
std::vector<int> q;
for (auto i: v) {
q.push_back(i);
if (q.size() > n) {
q.erase(std::min_element(q.begin(), q.end()));
}
}
return q;
}
And I have made up some tests:
#include <random>
#include <iostream>
#include <chrono>
std::vector<int> filter_nth_element(std::vector<int> v, int n);
std::vector<int> filter_pqueue(std::vector<int> v, int n);
std::vector<int> filter_set(std::vector<int> v, int n);
std::vector<int> filter_deque(std::vector<int> v, int n);
std::vector<int> filter_vector(std::vector<int> v, int n);
struct stopclock {
typedef std::chrono::high_resolution_clock high_resolution_clock;
std::chrono::time_point<high_resolution_clock> start, end;
stopclock() : start(high_resolution_clock::now()) {}
~stopclock() {
using namespace std::chrono;
auto elapsed = high_resolution_clock::now() - start;
auto elapsed_ms = duration_cast<milliseconds>(elapsed);
std::cout << elapsed_ms.count() << " ";
}
};
int main() {
// randomly initialize input array
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dist;
std::vector<int> v(10000000);
for (auto &i: v)
i = dist(gen);
// run tests
for (std::vector<int>::size_type x = 5; x <= 100; x+=5) {
// keep this many values
std::cout << x << " ";
{
stopclock t;
auto result = filter_nth_element(v, x);
}
{
stopclock t;
auto result = filter_pqueue(v, x);
}
{
stopclock t;
auto result = filter_set(v, x);
}
{
stopclock t;
auto result = filter_deque(v, x);
}
{
stopclock t;
auto result = filter_vector(v, x);
}
std::cout << "\n";
}
}
And I found it quite interesting to see the relative performance of these approaches (compiled with -O3 - I think I have to think a bit about these results):

A binary search tree could be a suitable data structure for this problem. Maybe you can find a suitable Tree class in STL or Boost or so (try to look for that). Otherwise simply use a struct if you insist.
The struct would be like that:
struct tnode {        /* the tree node: */
    char *word;           /* points to the text */
    int count;            /* number of occurrences */
    struct tnode *left;   /* left child */
    struct tnode *right;  /* right child */
};
Taken from The C Programming Language, chapter 6.5 - Self-referential Structures. Just adapt it to your needs.
Though, I think if you want to program in C++ properly, try to create a Tree data structure (class) or try to use an existing one.
Considering that you only have 300 entries, that should do it.
In theory when the input data is random it is supposed to be efficient. But that is theory and does not really play a role in your case. I think it is a good solution.

You can use sorted buffer of 5 elements and on each step if item is higher than lowest item of the buffer, put item in buffer and evict lowest

Use a map of elements
First Create a class
class Data {
public:
std::string name;
int number;
};
typedef std::map< int, Data > DataItems;
DataItems largest;
If the size of largest is < 5, then you haven't read five elements.
if( largest.size() < 5 ) {
largest[ dt.number] = dt;
} else {
Otherwise - if it is larger than the smallest of the largest five, then the largest five has changed.
DataItems::iterator it = largest.begin(); // lowest current item.
if( it->second.number < dt.number ) { // is this item bigger? - yes
largest[ dt.number ] = dt; // add it (largest.size() == 6)
largest.erase( largest.begin() );// remove smallest item
}
}

You can use a set to keep track of the highest values. If you want to track non-unique numbers use a multiset instead:
vector<int> nums{10,11,12,1,2,3,4,5,6,7,8,9}; //example data
int k=5; // number of values to track
set<int> s; // this set will hold the result
for(auto a: nums)
{
if(s.size()<k)s.insert(a);
else if(a>*s.begin())
{
s.erase(s.begin());
s.insert(a);
}
}
Of course you will have to provide a custom comparison function for your struct.

I'm surprised nobody has mentioned priority queue data-structure that's made exactly for this
https://en.cppreference.com/w/cpp/container/priority_queue

Counting number of distinct integers in array

To find the number of distinct numbers in an array from the lth to the rth index, I wrote a code block like:
int a[1000000];
//statements to input n number of terms from user in a.. along with l and r
int count=r-l+1; //assuming all numbers to be distinct
for(; l<=r; l++){
for(int i=l+1; i<=r; i++){
if(a[l]==a[i]){
count--;
break;
}
}
}
cout<<count<<'\n';
Explanation
For an array say, a=5 6 1 1 3 2 5 7 1 2 of ten elements. If we wish to check the number of distinct numbers between a[1] and a[8] that is the second and the 9th elements (including both), The logic I tried to implement would first take count=8 (no. of elements to be considered) and then it starts from a[1] that is 6 and checks for any other 6 after it, if it does find, it decreases the count by one and goes on for the next number in the row. So that if there are any more occurrence of 6 after that one, it would not be included twice.
Problem I tried small test cases and it works. But when I tried with bigger data, it did not work, so I wanted to know where would my logic fail?
Bigger data, as in integrated with other parts of the program and then used. Which gave incorrect output

You can try to use std::set
Basic idea is to add all the elements into your new set, and just output the size of your set.
#include <iostream>
#include <vector>
#include <set>
using namespace std;
int main()
{
int l = 1, r = 6;
int arr[] = {1, 1, 2, 3, 4, 5, 5, 5, 5};
set<int> s(&arr[l], &arr[r + 1]);
cout << s.size() << endl;
return 0;
}

Here is an answer that does not use std::set, although that solution is probably simpler.
#include <algorithm>
#include <vector>
int main()
{
int input[10]{5, 6, 1, 1, 3, 2, 5, 7, 1, 2}; //because you like raw arrays, I guess?
std::vector<int> result(std::cbegin(input), std::cend(input)); //result now contains all of input
std::sort(std::begin(result), std::end(result)); //result now holds 1 1 1 2 2 3 5 5 6 7
result.erase(std::unique(std::begin(result), std::end(result)), std::end(result)); //result now holds 1 2 3 5 6 7
result.size(); //gives the count of distinct integers in the given array
}
Here it is live on Coliru if you're into that.
--
EDIT: Here, have a short version of the set solution, too.
#include <set>
int main()
{
int input[10]{5, 6, 1, 1, 3, 2, 5, 7, 1, 2}; //because you like raw arrays, I guess?
std::set<int> result(std::cbegin(input), std::cend(input));
result.size();
}

The first question to ask with this type of problem is what is the possible range of the values. if the range of numbers N is "reasonably small", then you can use a boolean array of size N to indicate whether the number corresponding to the index is present. You iterate from l to r, setting the flag, and if the flag was not already set increment a counter.
count = 0;
for(int i=l; i<=r; i++) {
if (! isthere[arr[i]]) {
count++;
isthere[arr[i]] = TRUE;
}
}
In terms of complexity, both this approach and the one based on set are O(n), but this one is faster as there is no hashing involved. For small N, for example for numbers between 0-255, most likely this is also likely to be less memory intensive. For larger N, for example if any 32-bit integers is allowed, the set based approach is more suitable.

You said you didn't mind another solution. So here it is. It uses set - a structure that stores only unique elements. By the way, on the bigger data - it will much faster than solution with two cycles.
set<int> a1;
for (int i = l; i <= r; i++)
{
a1.insert(a[i]);
}
cout << a1.size();

In the below process I'm giving process of counting unique numbers. In this technique you just get unique elements in an array. this process will update your array with garbage value. So in this process you can't use this array (that we will use) further anymore. This array will automatically resize with distinct elements.
#include <stdio.h>
#include <iostream>
#include <algorithm> // for using unique (library function)
int main(){
int arr[] = {1, 1, 2, 2, 3, 3};
int len = sizeof(arr)/sizeof(*arr); // finding size of arr (array)
int unique_sz = std:: unique(arr, arr + len)-arr; // Counting unique elements in arr (Array).
std:: cout << unique_sz << '\n'; // Printing number of unique elements in this array.
return 0;
}
If you want to handle that problem (That I told before), you can follow this process. You can handle this by coping your array in another array.
#include <stdio.h>
#include <iostream>
#include <algorithm> // for using copy & unique (library functions)
#include <string.h> // for using memcpy (library function)
int main(){
int arr[] = {1, 1, 2, 2, 3, 3};
int brr[100]; // we will copy arr (Array) to brr (Array)
int len = sizeof(arr)/sizeof(*arr); // finding size of arr (array)
std:: copy(arr, arr+len, brr); // which will work on C++ only (you have to use #include <algorithm>
memcpy(brr, arr, len*(sizeof(int))); // which will work on C only
int unique_sz = std:: unique(arr, arr+len)-arr; // Counting unique elements in arr (Array).
std:: cout << unique_sz << '\n'; // Printing number of unique elements in this array.
for(int i=0; i<len; i++){ // Here is your old array, that we store to brr (Array) from arr (Array).
std:: cout << brr[i] << " ";
}
return 0;
}

Personally, I'd just use standard algorithms
#include<algorithm>
#include <iostream>
int main()
{
int arr[] = {1, 1, 2, 3, 4, 5, 5, 5, 5};
int *end = arr + sizeof(arr)/sizeof(*arr);
std::sort(arr, end);
int *p = std::unique(arr, end);
std::cout << (int)(p - arr) << '\n';
}
This obviously relies on being allowed to modify the array (any duplicates are moved to the end of arr). But it is easy to create a copy of an array if needed and work on the copy.

TL;DR: Use this:
template<typename InputIt>
std::size_t countUniqueElements(InputIt first, InputIt last) {
using value_t = typename std::iterator_traits<InputIt>::value_type;
return std::unordered_set<value_t>(first, last).size();
}
There are two approaches:
Insert everything into a set, count the set. Because you don't care about the order you can use a std::unordered_set which will be faster than std::set. std::set is implemented as a tree which does a lot of allocations so it can be slow.
Use std::sort. If you want to preserve the original array you'll need to make a copy of it.
Here is a complete example.
#include <algorithm>
#include <cstdint>
#include <vector>
#include <unordered_set>
#include <iostream>
template<typename RandomIt>
std::size_t countUniqueElementsSort(RandomIt first, RandomIt last) {
if (first == last)
return 0;
std::sort(first, last);
std::size_t count = 1;
auto val = *first;
while (++first != last) {
if (*first != val) {
++count;
}
val = *first;
}
return count;
}
template<typename InputIt>
std::size_t countUniqueElementsSet(InputIt first, InputIt last) {
using value_t = typename std::iterator_traits<InputIt>::value_type;
return std::unordered_set<value_t>(first, last).size();
}
int main() {
std::vector<int> v = {1, 3, 4, 4, 3, 6};
std::cout << countUniqueElementsSet(v.begin(), v.end()) << "\n";
std::cout << countUniqueElementsSort(v.begin(), v.end()) << "\n";
int v2[] = {1, 3, 4, 4, 3, 6};
std::cout << countUniqueElementsSet(v2, v2 + 6) << "\n";
std::cout << countUniqueElementsSort(v2, v2 + 6) << "\n";
}
Using that loop in the sort version should be faster than std::unique.
The complexity of 2. is worse than 1. - the average case is O(N) vs O(N log N). But it avoids allocation so may end up being faster for small arrays or ones that are already sorted or mostly already sorted.
You should definitely not use std::set, and probably not use std::unique (though it does lead to fewer lines of code, and won't make that much difference to performance so up to you).
In any case, in most cases you should go with the set version - it's a lot simpler simpler and should be faster in almost all cases.
As other people have mentioned, if you know your input domain is small you can use a bool array instead of an unordered_set.

How to Create All Permutations of Variables from a Variable Number of STL Vectors [duplicate]

This question already has answers here:
Generate all combinations from multiple lists
(11 answers)
Closed 9 years ago.
I have a variable number of std::vectors<int>, let's say I have 3 vectors in this example:
std::vector<int> vect1 {1,2,3,4,5};
std::vector<int> vect2 {1,2,3,4,5};
std::vector<int> vect3 {1,2,3,4,5};
The values of the vectors are not important here. Also, the lengths of these vectors will be variable.
From these vectors, I want to create every permutation of vector values, so:
{1, 1, 1}
{1, 1, 2}
{1, 1, 3}
...
...
...
{3, 5, 5}
{4, 5, 5}
{5, 5, 5}
I will then insert each combination into a key-value pair map for further use with my application.
What is an efficient way to accomplish this? I would normally just use a for loop, and iterate across all parameters to create all combinations, but the number of vectors is variable.
Thank you.
Edit: I will include more specifics.
So, first off, I'm not really dealing with ints, but rather a custom object. ints are just for simplicity. The vectors themselves exist in a map like so std::map<std::string, std::vector<int> >.
My ultimate goal is to have an std::vector< std::map< std::string, int > >, which is essentially a collection of every possible combination of name-value pairs.

Many (perhaps most) problems of the form "I need to generate all permutations of X" can be solved by creative use of simple counting (and this is no exception).
Let's start with the simple example: 3 vectors of 5 elements apiece. For our answer we will view an index into these vectors as a 3-digit, base-5 number. Each digit of that number is an index into one of the vectors.
So, to generate all the combinations, we simply count from 0 to 53 (125). We convert each number into 3 base-5 digits, and use those digits as indices into the vectors to get a permutation. When we reach 125, we've enumerated all the permutations of those vectors.
Assuming the vectors are always of equal length, changing the length and/or number of vectors is just a matter of changing the number of digits and/or number base we use.
If the vectors are of unequal lengths, we simply produce a result in which not all of the digits are in the same base. For example, given three vectors of lengths 7, 4 and 10, we'd still count from 0 to 7x4x10 = 280. We'd generate the least significant digit as N%10. We'd generate the next least significant as (N/10)%4.
Presumably that's enough to make it fairly obvious how to extend the concept to an arbitrary number of vectors, each of arbitrary size.

0 - > 0,0,0
1 - > 0,0,1
2 - > 0,1,0
3 - > 0,1,1
4 - > 1,0,0
...
7 - > 1,1,1
8 - > 1,1,2
...
The map should translate a linear integer into a combination (ie: a1,a2,a3...an combination) that allows you to select one element from each vector to get the answer.
There is no need to copy any of the values from the initial vectors. You can use a mathematical formula to arrive at the right answer for each of the vectors. That formula will depend on some of the properties of your input vectors (how many are there? are they all the same length? how long are they? etc...)

Following may help: (https://ideone.com/1Xmc9b)
template <typename T>
bool increase(const std::vector<std::vector<T>>& v, std::vector<std::size_t>& it)
{
for (std::size_t i = 0, size = it.size(); i != size; ++i) {
const std::size_t index = size - 1 - i;
++it[index];
if (it[index] == v[index].size()) {
it[index] = 0;
} else {
return true;
}
}
return false;
}
template <typename T>
void do_job(const std::vector<std::vector<T>>& v, std::vector<std::size_t>& it)
{
// Print example.
for (std::size_t i = 0, size = v.size(); i != size; ++i) {
std::cout << v[i][it[i]] << " ";
}
std::cout << std::endl;
}
template <typename T>
void iterate(const std::vector<std::vector<T>>& v)
{
std::vector<std::size_t> it(v.size(), 0);
do {
do_job(v, it);
} while (increase(v, it));
}

This is an explicit implementation of what Lother and Jerry Coffin are describing, using the useful div function in a for loop to iterate through vectors of varying length.
#include <cstdlib> // ldiv
#include <iostream>
#include <map>
#include <string>
#include <vector>
using namespace std;
vector<int> vect1 {100,200};
vector<int> vect2 {10,20,30};
vector<int> vect3 {1,2,3,4};
typedef map<string,vector<int> > inputtype;
inputtype input;
vector< map<string,int> > output;
int main()
{
// init vectors
input["vect1"] = vect1;
input["vect2"] = vect2;
input["vect3"] = vect3;
long N = 1; // Total number of combinations
for( inputtype::iterator it = input.begin() ; it != input.end() ; ++it )
N *= it->second.size();
// Loop once for every combination to fill the output map.
for( long i=0 ; i<N ; ++i )
{
ldiv_t d = { i, 0 };
output.emplace_back();
for( inputtype::iterator it = input.begin() ; it != input.end() ; ++it )
{
d = ldiv( d.quot, input[it->first].size() );
output.back()[it->first] = input[it->first][d.rem];
}
}
// Sample output
cout << output[0]["vect1"] << endl; // 100
cout << output[0]["vect2"] << endl; // 10
cout << output[0]["vect3"] << endl; // 1
cout << output[N-1]["vect1"] << endl; // 200
cout << output[N-1]["vect2"] << endl; // 30
cout << output[N-1]["vect3"] << endl; // 4
return 0;
}

Use a vector array instead of separate variables. then use following recursive algorithm :-
permutations(i, k, vectors[], choices[]) {
if (i < k) {
for (int x = 0; x < vectors[i].size(); x++) {
choices[i] = x;
permutations(i + 1, k, vectors, choices);
}
} else {
printf("\n %d", vectors[choices[0]]);
for (int j = 1; j < k; j++) {
printf(",%d", vectors[choices[j]]);
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding Frequency of numbers in a given group of numbers - c++

Optimized for space: Quicksort (for example) then iterate over the items, keeping track of largest count only. At best O(N log N). Optimized for speed: Iterate over all elements, keeping track of the separate counts. This algorithm will always be O(n).

If you have the RAM and your values are not too large, use counting sort.

Related

Finding Duplicates in an array using a Set in C++

Default value for the second element of the map STL?

Keep track of highest 5 numbers during file input

Counting number of distinct integers in array

How to Create All Permutations of Variables from a Variable Number of STL Vectors [duplicate]

Categories

Resources