How to count duplicate entries of a vector in C++

How to count duplicate entries of a vector in C++ - c++

I'm using Armadillo to do linear algebra calculation in C++.
For example, there is a
vector a = (1,1,2,2,0,2,1,0)
I wish return a matrix
(0, 2) //means 0 shows 2 times in the vector
(1, 3) //1 shows 3 times
(2, 3) //2 shows 3 times
Is there any function can fulfill such job?

As mentioned in comments you could use a std::map to collect the results. Then you can convert to a matrix as you see fit. You could skip the map step and use a matrix directly if it's already pre-initialised with the rows you're after.
As for a function to do this, you can use std::for_each from <algorithm> along with a lambda expression, although it seems overkill when a loop would be fine.
#include <algorithm>
#include <iostream>
#include <vector>
#include <map>
using namespace std;
int main()
{
vector<int> v{1,1,2,2,0,2,1,0};
map<int,int> dup;
for_each( v.begin(), v.end(), [&dup]( int val ){ dup[val]++; } );
for( auto p : dup ) {
cout << p.first << ' ' << p.second << endl;
}
return 0;
}

Here's another solution, using only Armadillo functions, and a C++11 compiler:
vec a = {1,1,2,2,0,2,1,0}; // vec holds elements of type 'double'
vec b = unique(a);
uvec c = hist(a,b); // uvec holds unsigned integers
mat X(b.n_rows, 2);
X.col(0) = b;
X.col(1) = conv_to<vec>::from(c);
X.print("X:");
Explanation:
vec b = unique(a) creates a vector containing the unique elements of a, sorted in ascending order
uvec c = hist(a,b) creates a histogram of counts of elements in a, using b as the bin centers
conv_to<vec>::from(c) converts c (vector with unsigned integers) to the same vector type as a

Related

How to store a sequence of integers in a C++ array?

Assume I have to store 100 integers in array, should I declare an array and store all 100 integers from 1 to 100 or is there any solution that will work better ? and loop them through a for loop or any loop concept in c++
example
int numbers [100] = { 1 ,2 3,......10};
for(){
//manipulate the array elements here
}
or
int numbers = {100};
what if I have more than 100 elements ? i.e. integer number from 1 to 200 or more what is good concept to achieve ?

You can use std::iota to generate a sequence.
For example, like this:
#include <iostream>
#include <numeric>
#include <vector>
int main() {
std::vector<int> vec(100);
std::iota(begin(vec), end(vec), 1);
for (auto i : vec)
std::cout << i << " ";
std::cout << "\n";
}
If you have to use a raw array, you can fill it like this:
int vec[100];
std::iota(vec, vec + 100, 1);

C++ Armadillo: Get the ranks of the elements in a vector

Suppose I have a vector, and I want to get the ranks of the elements if they were sorted.
So if I have the vector:
0.5
1.5
3.5
0.1
and I need returned the ranks of each element:
2
3
4
1
Is there a way to do this in Armadillo? This is different than the previous post since we are getting the ranks and not the indices before sorting.

Here, check this out:
#include<iostream>
#include<vector> // std:: vector
#include <algorithm> // std::sort
#include <map> // std::map
using namespace std;
int main() {
vector<double> myVector = { 0.5, 1.5, 3.5, 0.1 };
vector<double> Sorted = myVector;
std::sort(Sorted.begin(), Sorted.end());
map<double, int> myMap;
for (int i = 0; i < Sorted.size() ; i++)
{
myMap.insert(make_pair(Sorted[i],i));
}
for (int i = 0; i < myVector.size() ; i++)
{
auto it = myMap.find(myVector[i]);
cout << it->second + 1 << endl;
}
return 0;
};
Output:

Here is my codes using STL to get the rankings in a concise way
template <typename T>
vector<size_t> calRank(const vector<T> & var) {
vector<size_t> result(var.size());
//sorted index
vector<size_t> indx(var.size());
iota(indx.begin(),indx.end(),0);
sort(indx.begin(),indx.end(),[&var](int i1, int i2){return var[i1]<var[i2];});
//return ranking
for(size_t iter=0;iter<var.size();++iter){
//it may cause overflow for a really long vector, in practice it should be ok.
result[indx[iter]]=iter+1;
}
return result;
}

Here is a solution in pure Armadillo code, using the function arma::sort_index().
The function arma::sort_index() calculates the permutation index to sort a given vector into ascending order.
Applying the function arma::sort_index() twice:
arma::sort_index(arma::sort_index())}, calculates the reverse permutation index to sort the vector from ascending order back into its original unsorted order. The ranks of the elements are equal to the reverse permutation index.
Below is Armadillo code wrapped in RcppArmadillo that defines the function calc_ranks(). The function calc_ranks() calculates the ranks of the elements of a vector, and it can be called from R.
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// Define the function calc_ranks(), to calculate
// the ranks of the elements of a vector.
//
// [[Rcpp::export]]
arma::uvec calc_ranks(const arma::vec& da_ta) {
return (arma::sort_index(arma::sort_index(da_ta)) + 1);
} // end calc_ranks
The above code can be saved to the file calc_ranks.cpp, so it can be compiled in R using the function Rcpp::sourceCpp().
Below is R code to test the function calc_ranks() (after it's been compiled in R):
# Compile Rcpp functions
Rcpp::sourceCpp(file="C:/Develop/R/Rcpp/calc_ranks.cpp")
# Create a vector of random data
da_ta <- runif(7)
# Calculate the ranks of the elements
calc_ranks(da_ta)
# Compare with the R function rank()
all.equal(rank(da_ta), drop(calc_ranks(da_ta)))

Best way to split a vector into two smaller arrays?

What I'm trying to do:
I am trying to split a vector into two separate arrays. The current int vector contains an element per line in a text file. The text file is a list of random integers.
How I'm planning to do it:
My current idea is to create two regular int arrays, then iterate over the entire vector and copy n/2 elements to each of the arrays.
What I would like to know:
What is the most elegant way of accomplishing my task? I have a feeling that I can do this without iterating over the vector multiple times.
Code:
#include <vector>
#include <fstream>
#include <iterator>
#include <iostream>
using namespace std;
vector<int> ifstream_lines(ifstream& fs)
{
vector<int> out;
int temp;
while(fs >> temp)
{
out.push_back(temp);
}
return out;
}
vector<int> MergeSort(vector<int>& lines)
{
int split = lines.size() / 2;
int arrayA[split];
int arrayB[split];
}
int main(void)
{
ifstream fs("textfile.txt");
vector<int> lines;
lines = ifstream_lines(fs);
return 0;
}
Thank you :)

Use iterators.
std::vector<int> lines;
// fill
std::size_t const half_size = lines.size() / 2;
std::vector<int> split_lo(lines.begin(), lines.begin() + half_size);
std::vector<int> split_hi(lines.begin() + half_size, lines.end());
Since iterator ranges represent half open ranges [begin, end), you don't need to add 1 to the second begin iterator: lines.begin() + half_size isn't copied to the first vector.
Note that things like
int split = lines.size() / 2;
int arrayA[split];
int arrayB[split];
Are not standard C++ (and as such not portable). These are so-called variable-length arrays (VLAs for short) and are a C99 thing. Some compilers have them as an extension while compiling C++ code (GCC, Clang). Always compile with -pedantic to get a warning. These VLAs act funky for non-POD types and aren't generally useful, since you can't even return them.

If you can't use code from Xeo answer due to strict compiler rules or you want more generic way, try std::advance:
#include <vector>
#include <iterator>
size_t middle = input.size()/2;
std::vector<int>::const_iterator middleIter(input.cbegin());
std::advance(middleIter, middle);
std::vector<int> leftHalf(input.begin(), middleIter);
std::vector<int> rightHalf(middleIter, input.end());

If you only need a reference to the numbers without manipulating them, then you can do:
int *array_1 = &lines[0];
int *array_2 = &lines[lines.size() / 2];
array_1 and array_2 are, actually, pointers to the start and middle of the vector. This works since STL guarantees that vectors store their elements within a continuous memory.
Note that referring to lines.begin() can't be used for this.

Solution to split vector to variable count parts using iterator.
#include <iostream>
#include <vector>
int main()
{
// Original vector of data
std::vector<double> mainVec{1.2, 2.3, 3.4, 4.5, 5.6, 6.7, 7.8, 8.9, 9.0};
// Result vectors
std::vector<std::vector<double>> subVecs{};
// Start iterator
auto itr = mainVec.begin();
// Variable to control size of non divided elements
unsigned fullSize = mainVec.size();
// To regulate count of parts
unsigned partsCount = 4U;
for(unsigned i = 0; i < partsCount; ++i)
{
// Variable controls the size of a part
auto partSize = fullSize / (partsCount - i);
fullSize -= partSize;
//
subVecs.emplace_back(std::vector<double>{itr, itr+partSize});
itr += partSize;
}
// Print out result
for (const auto& elemOuter : subVecs)
{
std::cout << std::fixed;
for (const auto& elemInner : elemOuter)
{
std::cout << elemInner << " ";
}
std::cout << "\n";
}
}

C++ sort on vector using function object

I'm trying to sort a vector v1 using another vector v2. I can't wrap my head around this error:
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
Abort trap
while running this code:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
class Comp
{
public:
Comp(vector<double>& inVec): _V(inVec) {}
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
private:
vector<double> _V;
};
int main(int argc, char** argv)
{
double x1[] = {90.0, 100.0, 80.0};
double x2[] = {9.0, 3.0, 1.0};
vector<double> v1(x1,x1+3);
vector<double> v2(x2,x2+3);
sort(v1.begin(), v1.end(), Comp(v2)); // sort v1 according to v2
for(unsigned int i=0; i<v1.size(); i++)
{
cout << v1.at(i) << " " << v2.at(i) << endl;
}
return 0;
}
v1 and v2 are of the same size. Why the out_of_range error?
Thanks in advance for any pointers.

I believe that your problem is in this line:
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
The problem is that when the std::sort algorithm uses a custom callback, it passes in the actual values stored in the vector at particular locations, not the indices of those locations within the vector. As a result, when you call
sort(v1.begin(), v1.end(), Comp(v2)); // sort v1 according to v2
The Comp comparator you've written will be getting passed as parameters the values stored in the v1 vector and will then try indexing at those positions into the v2 vector. Since the values in v1 are larger than the size of v2, the call to _V.at(i) will cause an out_of_range exception to be thrown.
If you want to sort the two ranges with respect to one another, you'll need to adopt a different approach. I'm not aware of a straightforward way of doing this, but I'll let you know if I think of one.

Size of v1 is just 3, but you're using each value of v2 as index of v1. And as v2 has one value 9 which is greater than the size of v1, that is what gives std::out_of_range error in here:
bool operator()(int i, int j) {return (_V.at(i)<_V.at(j));}
std::vector::at function gives std::out_of_range exception of the index passed to it as argument is greater than the size of vector. That is, the index must be less than vector::size().

Ok, now you're probably aware of the fact, that i and j are actual values held in vector rather than indices. There is a good reason for that: sorting is all about values, not indexes. Note you're passing iterators to sort method, so there is no way it can extract index for you. Of course, you could get index relative to first iterator, but there is no reason for doing this.
However, let's be insane for awhile and imagine you would get indices rather than values in your comparator. Assume that your code does what you want and let's think about following scenario:
v1 = {20,10}; v2 = {2,1}
I secretly assume you want the following output:
v1 = {10, 20}
right? Now imagine I'm a sorting function you're calling and I do following steps:
v2[0] < v2[1] is false, so swap(&v1[0], &v1[1])
It's sorted, isn't it? But wait, I'm a crazy sorting function, so I want to make sure it's sorted, so I do the following:
v2[0] < v2[1] is false, swap(&v1[0], &v1[1])
And again:
v2[0] < v2[1] is false, swap(&v1[0], &v1[1])
and again, again, again...
Can you see a problem? Sorting function has some requirements and for sure you're breaking fundamental one.
I suspect you need completely different container (maybe std::map with keys from vec1 and values from vec2) or at least something like vector< pair<double, double> >, so you can easily sort by either first or second value. If not, consider creating vector with values in range [0, v2.size()), sorting it using your comparator (values are equal to indices, so will be all right) and then print correct values from v1. This code works fine:
vector<size_t> indices;
for(size_t i =0; i < v1.size(); ++i)
{
indices.push_back(i);
}
// yes, it works using your original comparator
sort(indices.begin(), indices.end(), Comp(v2));
for(size_t i =0; i < indices.size(); ++i)
{
cout << v1.at(indices[i]) << " " << v2.at(indices[i]) << endl;
}

Like said in other answers, the problem is that the sort algorithm passes the actual values to compare rather than indices.
Here is how you can solve it:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
typedef pair<double, double> Zipped; // Represent an element of two lists
// "zipped" together
// Compare the second values of two pairs
bool compareSeconds ( Zipped i, Zipped j )
{
return i.second < j.second;
}
int main ( int argc, char **argv )
{
double x1[] = { 90, 100, 80 };
double x2[] = { 9, 3, 1 };
vector<double> v1(x1, x1 + 3);
vector<double> v2(x2, x2 + 3);
vector<Zipped> zipped(v1.size()); // This will be the zipped form of v1
// and v2
for ( int i = 0; i < zipped.size(); ++i )
{
zipped[i] = Zipped(v1[i], v2[i]);
}
sort(zipped.begin(), zipped.end(), &compareSeconds);
for ( int i = 0; i < zipped.size(); ++i )
{
cout << zipped[i].first << " " << zipped[i].second << endl;
}
for ( int i = 0; i < v1.size(); ++i )
{
v1[i] = zipped[i].first;
}
// At this point, v1 is sorted according to v2
return 0;
}

Finding Frequency of numbers in a given group of numbers

Suppose we have a vector/array in C++ and we wish to count which of these N elements has maximum repetitive occurrences and output the highest count. Which algorithm is best suited for this job.
example:
int a = { 2, 456, 34, 3456, 2, 435, 2, 456, 2}
the output is 4 because 2 occurs 4 times. That is the maximum number of times 2 occurs.

Sort the array and then do a quick pass to count each number. The algorithm has O(N*logN) complexity.
Alternatively, create a hash table, using the number as the key. Store in the hashtable a counter for each element you've keyed. You'll be able to count all elements in one pass; however, the complexity of the algorithm now depends on the complexity of your hasing function.

Optimized for space:
Quicksort (for example) then iterate over the items, keeping track of largest count only.
At best O(N log N).
Optimized for speed:
Iterate over all elements, keeping track of the separate counts.
This algorithm will always be O(n).

If you have the RAM and your values are not too large, use counting sort.

A possible C++ implementation that makes use of STL could be:
#include <iostream>
#include <algorithm>
#include <map>
// functor
struct maxoccur
{
int _M_val;
int _M_rep;
maxoccur()
: _M_val(0),
_M_rep(0)
{}
void operator()(const std::pair<int,int> &e)
{
std::cout << "pair: " << e.first << " " << e.second << std::endl;
if ( _M_rep < e.second ) {
_M_val = e.first;
_M_rep = e.second;
}
}
};
int
main(int argc, char *argv[])
{
int a[] = {2,456,34,3456,2,435,2,456,2};
std::map<int,int> m;
// load the map
for(unsigned int i=0; i< sizeof(a)/sizeof(a[0]); i++)
m [a[i]]++;
// find the max occurence...
maxoccur ret = std::for_each(m.begin(), m.end(), maxoccur());
std::cout << "value:" << ret._M_val << " max repetition:" << ret._M_rep << std::endl;
return 0;
}

a bit of pseudo-code:
//split string into array firts
strsplit(numbers) //PHP function name to split a string into it's components
i=0
while( i < count(array))
{
if(isset(list[array[i]]))
{
list[array[i]]['count'] = list + 1
}
else
{
list[i]['count'] = 1
list[i]['number']
}
i=i+1
}
usort(list) //usort is a php function that sorts an array by its value not its key, Im assuming that you have something in c++ that does this
print list[0]['number'] //Should contain the most used number

The hash algorithm (build count[i] = #occurrences(i) in basically linear time) is very practical, but is theoretically not strictly O(n) because there could be hash collisions during the process.
An interesting special case of this question is the majority algorithm, where you want to find an element which is present in at least n/2 of the array entries, if any such element exists.
Here is a quick explanation, and a more detailed explanation of how to do this in linear time, without any sort of hash trickiness.

If the range of elements is large compared with the number of elements, I would, as others have said, just sort and scan. This is time n*log n and no additional space (maybe log n additional).
THe problem with the counting sort is that, if the range of values is large, it can take more time to initialize the count array than to sort.

Here's my complete, tested, version, using a std::tr1::unordered_map.
I make this approximately O(n). Firstly it iterates through the n input values to insert/update the counts in the unordered_map, then it does a partial_sort_copy which is O(n). 2*O(n) ~= O(n).
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <iostream>
namespace {
// Only used in most_frequent but can't be a local class because of the member template
struct second_greater {
// Need to compare two (slightly) different types of pairs
template <typename PairA, typename PairB>
bool operator() (const PairA& a, const PairB& b) const
{ return a.second > b.second; }
};
}
template <typename Iter>
std::pair<typename std::iterator_traits<Iter>::value_type, unsigned int>
most_frequent(Iter begin, Iter end)
{
typedef typename std::iterator_traits<Iter>::value_type value_type;
typedef std::pair<value_type, unsigned int> result_type;
std::tr1::unordered_map<value_type, unsigned int> counts;
for(; begin != end; ++begin)
// This is safe because new entries in the map are defined to be initialized to 0 for
// built-in numeric types - no need to initialize them first
++ counts[*begin];
// Only need the top one at this point (could easily expand to top-n)
std::vector<result_type> top(1);
std::partial_sort_copy(counts.begin(), counts.end(),
top.begin(), top.end(), second_greater());
return top.front();
}
int main(int argc, char* argv[])
{
int a[] = { 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
std::pair<int, unsigned int> m = most_frequent(a, a + (sizeof(a) / sizeof(a[0])));
std::cout << "most common = " << m.first << " (" << m.second << " instances)" << std::endl;
assert(m.first == 2);
assert(m.second == 4);
return 0;
}

It wil be in O(n)............ but the thing is the large no. of array can take another array with same size............
for(i=0;i
mar=count[o];
index=o;
for(i=0;i
then the output will be......... the element index is occured for max no. of times in this array........
here a[] is the data array where we need to search the max occurance of certain no. in an array.......
count[] having the count of each element..........
Note : we alrdy knw the range of datas will be in array..
say for eg. the datas in that array ranges from 1 to 100....... then have the count array of 100 elements to keep track, if its occured increament the indexed value by one........

Now, in the year 2022 we have
namespace aliases
more modern containers like std::unordered_map
CTAD (Class Template Argument Deduction)
range based for loops
using statment
the std::ranges library
more modern algorithms
projections
structured bindings
With that we can now write:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
namespace rng = std::ranges;
int main() {
// Demo data
std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };
// Count values
using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;
Counter counter{}; for (const auto& d : data) counter[d]++;
// Get max
const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);
// Show output
std::cout << '\n' << value << " found " << count << " times\n";
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to count duplicate entries of a vector in C++ - c++

I'm using Armadillo to do linear algebra calculation in C++. For example, there is a vector a = (1,1,2,2,0,2,1,0) I wish return a matrix (0, 2) //means 0 shows 2 times in the vector (1, 3) //1 shows 3 times (2, 3) //2 shows 3 times Is there any function can fulfill such job?

Related

How to store a sequence of integers in a C++ array?

C++ Armadillo: Get the ranks of the elements in a vector

Best way to split a vector into two smaller arrays?

C++ sort on vector using function object

Finding Frequency of numbers in a given group of numbers

Categories

Resources