Counting Duplicates in C++ - multiset?

Counting Duplicates in C++ - multiset? - c++

UPD:-
Value Instances
 2   3
 3   2
 5   1
I want to limit the count to 1 for all the instances present in the multiset.
#include<bits/stdc++.h>
using namespace std;
int main() {
multiset<int> p1;
p1.insert(5);
p1.insert(2);
p1.insert(3);
p1.insert(3);
p1.insert(2);
p1.insert(2);
for(auto itr : p1) {
if(p1.count(itr) > 1)
p1.erase(itr);
cout << itr;
}
}
How to fix this ?

My comment:
In that case, you should use a std::set<int> because that is actually what matches your requirement. You could use also a std::map<int, int> to map the key to the number of occurrences if you like.
OPs reply:
Can you add this to a full-fledged answer so that I can accept it for this question?
Here we go:
Just filtering duplicates:
#include <iostream>
#include <set>
int main()
{
int sample[] = { 5, 2, 3, 3, 2, 2 };
// add all values at most once
using Table = std::set<int>;
Table table;
for (int value : sample) table.insert(value);
// output the result
for (const Table::value_type& entry : table) {
std::cout << "Value " << entry << "\n";
}
}
Output:
Value 2
Value 3
Value 5
Demo on coliru
Counting the number of occurrences:
#include <iostream>
#include <map>
int main()
{
int sample[] = { 5, 2, 3, 3, 2, 2 };
// add all values at most once but count the number of occurrences
using Table = std::map<int, unsigned>;
Table table;
for (int value : sample) ++table[value];
// output the result
for (const Table::value_type& entry : table) {
std::cout << "Value " << entry.first << " (" << entry.second << " times)\n";
}
}
Output:
Value 2 (3 times)
Value 3 (2 times)
Value 5 (1 times)
Demo on coliru
The trick:
The std::map::operator[] inserts an element if the key is not yet there. This element (in this case std::pair<const int, unsigned>) is default initialized which grants that it starts as { key, 0 }.
So, there are two cases:
The key is not yet there:
The element is created as { key, 0 } and the value (.second of the element) is incremented immediately which results in { key, 1 }.
The key is already there:
The value (.second of the element) is incremented again.
A variation on filtering duplicates:
This keeps the original input order but removes repetitions (by book-keeping in a separate std::set).
#include <iostream>
#include <set>
#include <vector>
int main()
{
using Sample = std::vector<int>;
Sample sample = { 5, 2, 3, 3, 2, 2 };
// remove duplicates
using Table = std::set<int>;
Table table;
Sample::iterator iterRead = sample.begin();
Sample::iterator iterWrite = sample.begin();
for (; iterRead != sample.end(); ++iterRead) {
if (table.insert(*iterRead).second) *iterWrite++ = *iterRead;
}
sample.erase(iterWrite, sample.end());
// output the result
for (const Sample::value_type& entry : sample) {
std::cout << "Value " << entry << "\n";
}
}
Output:
Value 5
Value 2
Value 3
Demo on coliru
The trick:
std::set::insert() returns a pair of iterator and bool.
The iterator points to the key in the set (inserted or already been there).
The bool denotes if the key was inserted (true) or was already there (false).
The other trick:
Just erasing every found duplicate from the std::vector would result in the worse complexity O(n²).
Hence, two iterators are used, one for reading and one for writing. Thereby, every input value which is not yet in the bookkeeping table (and hence occurs the first time) is written back, otherwise not.
So, every value which occurred the first time is shifted towards the beginning and appended to the previous values which occurred the first time each. Additionally, the iterWrite points past the last written element after the loop and can be used to erase the rest (which contains left input values which are all duplicates).
The complexity of this algorithm is O(n) – much better than the naive approach.
Btw. the standard algorithms std::remove(), std::remove_if() does it the same way.
Thus, the same algorithm could be achieved with std::remove_if():
#include <algorithm>
#include <iostream>
#include <set>
#include <vector>
int main()
{
using Sample = std::vector<int>;
Sample sample = { 5, 2, 3, 3, 2, 2 };
// remove duplicates
using Table = std::set<int>;
Table table;
Sample::iterator last
= std::remove_if(sample.begin(), sample.end(),
[&](int value) { return !table.insert(value).second; });
sample.erase(last, sample.end());
// output the result
for (const Sample::value_type& entry : sample) {
std::cout << "Value " << entry << "\n";
}
}
Output:
like above
Demo on coliru

#include <iostream>
#include <set>
using namespace std;
int main()
{
multiset<int> p1;
p1.insert(5);
p1.insert(2);
p1.insert(3);
p1.insert(4);
p1.insert(2);
p1.insert(2);
for (auto iter = p1.begin(); iter != p1.end();)
{
p1.count(*iter) > 1 ? iter = p1.erase(iter) : iter++;
}
for (auto & iter : p1)
{
cout << iter << ", ";
}
return 0;
}

Related

Prevent memory allocation in recursive combination generation

(Sorry about the title, it's not the best descriptive)
I am playing with graph theory, and generating all possible combinations of a given set of input numbers. Given the input set {2,3,4}, my possible combinations (of which there are 3!), are:
The following recursive solution works, but I don't like the fact that I have to "copy" the input vector in order to "remove" the element that represents the node I am following in order to prevent including it for output again. Elements I am going to output are stored in vecValues whereas the elements I can currently choose from are stored in vecInput:
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues)
{
// When hit 0 input size, output.
if (vecInput.size() == 0)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
}
size_t nSize = vecInput.size();
for (vector<int>::iterator iter = begin(vecInput); iter != end(vecInput); ++iter)
{
auto vecCopy = vecInput;
vecCopy.erase(find(begin(vecCopy), end(vecCopy), *iter));
vecValues.push_back(*iter);
OutputCombos(vecCopy, vecValues);
vecValues.pop_back();
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
OutputCombos(vecInput, vecValues);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
return 0;
}
As expected from my state space tree, the output is
2 3 4
2 4 3
3 2 4
3 4 2
4 2 3
4 3 2
How can I get around this without having to make a copy of the vector for each recursive call please?

You could always just use std::next_permutation from <algorithm>
#include <algorithm>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> input {2, 3, 4};
do {
for (auto i : input) std::cout << i << " ";
std::cout << std::endl;
} while(std::next_permutation(input.begin(), input.end()));
return 0;
}
This gives you the same output. You might want to check out a possible implementation of next_permutation, which involves swaps within the vector rather than copying the vector several times.

I think this might be closer to what you're looking for. A version without std::next_permutation that doesn't involve copying any vectors, and allows the input to remain const. However, it does this at the cost of checking the output in each iteration to make sure it doesn't add the same number twice.
#include<vector>
#include<iostream>
#include<algorithm>
template<typename T>
void OutputCombinations(
const std::vector<T>& input,
std::vector<typename std::vector<T>::const_iterator >& output)
{
for(auto it = input.begin(); it != input.end(); ++it)
{
if (std::find(output.begin(), output.end(), it) == output.end())
{
output.push_back(it);
if (output.size() == input.size())
{
for(auto node : output) std::cout << *node << " ";
std::cout << std::endl;
}
else OutputCombinations(input, output);
output.pop_back();
}
}
}
int main()
{
std::vector<int> nodes{ 2, 3, 4, 2 };
std::vector<std::vector<int>::const_iterator> result{};
OutputCombinations(nodes, result);
return 0;
}

After much studying I found inspiration in this article which gave me the ultimate solution. The idea is that we keep a vector of Boolean values which indicates whether or not a particular value has been used in the combination; that way we don't need to remove the element that we have already used hence there is no memory allocation overhead.
So, when building the branch {2,4,3}, if we get to {2,4}, vecTaken will be {true, false, true} and nNumBoolsSet will be 2. So when we loop, we will only "use" the element at index 1 of vecInput since that is the only element that has not been used as dictated by vecTaken.
void OutputCombos(vector<int>& vecInput, vector<int>& vecValues, vector<bool>& vecTaken, int& nNumBoolsSet)
{
size_t nSize = vecInput.size();
if (nNumBoolsSet == nSize)
{
for (int i : vecValues) cout << i << " ";
cout << endl;
return;
}
for (vector<int>::size_type i = 0; i < nSize; ++i)
{
if (vecTaken[i] == false)
{
vecValues.push_back(vecInput[i]);
vecTaken[i] = true;
++nNumBoolsSet;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
vecTaken[i] = false;
vecValues.pop_back();
--nNumBoolsSet;
}
}
}
void OutputCombos(vector<int>& vecInput)
{
vector<int> vecValues;
vector<bool> vecTaken(vecInput.size(), false);
int nNumBoolsSet = 0;
OutputCombos(vecInput, vecValues, vecTaken, nNumBoolsSet);
}
int main()
{
vector<int> vecInput{ 2,3,4 };
OutputCombos(vecInput);
}

How to use find_first_not_of with a vector of string?

Let's say I have the following object:
vector<string> data = {"12","12","12","12","13","14","15", "15", "15", "15", "18"};
I'm trying to find the first non-repeating entry in the data object.
For example, data.find_first_not_of(data.at(0)); this would work if data is of string type only (no container).
How can I achieve the same thing with an object of type vector.
I looked at adjacent_find and find_if_not from the algorithm library, but to no avail.
Your suggestions are much appreciated.

What problem did you have with adjacent_find? You should be able to use that with an inverse predicate:
std::vector<std::string> data = {"12","12","12","12","13","14","15", "15", "15", "15", "18"};
// Sort data here if necessary
auto itr = std::adjacent_find(data.cbegin(), data.cend(), std::not_equal_to<std::string>{});
if (itr != data.cend()) {
std::cout << "First mismatch: " << *itr << " " << *std::next(itr) << std::endl;
} else {
std::cout << "All elements equal" << std::endl;
}
Wandbox

Since you have to go through the list at least once, and you don't know when or where you will encounter the duplicate of a number (if there is one), one way to solve this is to first gather "statistics" and then from what you've gathered you can determine the first non-duplicate.
Here is an example using std::unordered_map:
#include <algorithm>
#include <unordered_map>
#include <iostream>
#include <vector>
#include <string>
// struct to hold some information on the numbers
struct info
{
std::string number;
int count;
int position;
info(const std::string n, int c, int p) : number(n), count(c), position(p) {}
};
int main()
{
std::vector<std::string> data = {"12","12","12","12","13","14","15", "15", "15", "15", "18"};
std::unordered_map<std::string, info> infoMap;
std::vector<info> vInfo;
int pos = 0;
// loop for each data element
std::for_each(data.begin(), data.end(), [&](const std::string& n)
{
// insert entry into the map
auto pr = infoMap.insert(std::make_pair(n, info(n, 0, pos)));
// bump up the count for this entry.
++pr.first->second.count;
// bump up the postion number
++pos;
});
// create a vector of the information with a count of 1 item.
std::for_each(infoMap.begin(), infoMap.end(), [&](std::unordered_map<std::string, info>::value_type& vt) { if (vt.second.count == 1) vInfo.push_back(vt.second); });
// sort this by position
std::sort(vInfo.begin(), vInfo.end(), [&](const info& pr1, const info &pr2){return pr1.position < pr2.position; });
// output the results
if ( vInfo.empty() )
std::cout << "All values are duplicated\n";
else
std::cout << "The first number that isn't repeated is " << vInfo.front().number << "\n";
}
Live Example
First, we just simply go through all the entries in the vector and just tally up the count for each item. In addition, we store the position in the original list of where the item was found.
After that we filter out the ones with a count of exactly 1 and copy them to a vector. We then sort this vector based on the position they were found in the original list.

Using Functor to make a custom sort method in sets

I am trying to make a custom sorting method for set but the compiler doesn't run it.
For example we are inserting the following numbers: 2 3 5 4 , set will sort them from the lowest value to the highest value, but what if I want it from the highest to the lowest?
For : 2 3 5 4, instead of sorting them like this 2 3 4 5, I want 5 4 3 2.
I wrote the following code:
#include <iostream>
#include <set>
#include <iterator>
using namespace std;
struct SortOrder {
bool operator()(const int &first,const int &last) {
if (first < last);
return (first < last);
return (first > last);
}
};
int main(){
set<int,SortOrder> date;
set<int>::iterator it;
date.insert(2);
date.insert(3);
date.insert(5);
date.insert(4);
for (it = date.begin(); it != date.end(); ++it) {
cout << *it <<" ";
}
return 0;
}

Your implementation of comparator is incorrect, as deepmax has explained: your operator returns false only when the two items are equal to each other; in all other cases it returns true, effectively making it a "not equal" operator.
You don't have to write your own implementation, because C++ Standard Library provides an implementation for you:
set<int,std::greater<int>> date;
std::greater<int> is the comparator that you want (demo).
set<int,std::greater<int>> date {2, 3, 5, 4};
ostream_iterator<int> out_it (cout, " ");
copy (date.begin(), date.end(), out_it );

How to use vector and struct?

I need to count letters from the string, sort them by count and cout results. For this purpose I'm trying to use vector and struct. Here is part of my code, but it's not working, because I don't know how to implement something:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
struct int_pair{
int key;
int value;
};
bool sort_by_value(int_pair left, int_pair right){
return left.value < right.value;
}
int main() {
string characters = "aasa asdfs dfh f ukjyhkh k wse f sdf sdfsdf";
vector<int_pair> most_frequent;
for (string::size_type i = 0; i <= characters.length(); i++) {
int int_char = (int)characters[i];
most_frequent[int_char]++; <-- I want to do something like this, but it's not working
}
sort(most_frequent.begin(), most_frequent.end(), sort_by_value);
for (vector<int_pair>::iterator it = most_frequent.begin(); it != most_frequent.end(); ++it) <-- is this call correct?
cout << " " << it->key << ":" << it->value << endl;
return 0;
}
At this code I have 2 parts that I don't know how to deal:
most_frequent[int_char]++; <-- I want to do something like this, but it's not working
and
for (vector<int_pair>::iterator it = most_frequent.begin(); it != most_frequent.end(); ++it) <-- is this call correct?
Maybe you can see any other mistakes and potential issues at this code.

I would use a std::map to determine the frequency of each letter, then copy that into a multimap while reversing the key and value to get them in order.
#include <iostream>
#include <map>
#include <algorithm>
template<class T, class U>
std::pair<U,T> flip_pair(const std::pair<T,U>& p) {
return std::make_pair(p.second,p.first);
}
int main(){
std::string characters = "zxcvopqiuweriuzxchajksdui";
std::map<char,int> freq;
std::multimap<int,char> rev_freq;
// Calculate the frequency of each letter.
for(char c: characters){
freq[c]++;
}
// Copy the results into a multimap with the key and value flipped
std::transform(std::begin(freq), std::end(freq),
std::inserter(rev_freq, rev_freq.begin()),
flip_pair<char,int>);
// Print out the results in order.
for(std::pair<int,char> p : rev_freq){
std::cout << p.first << ": " << p.second << std::endl;
}
};

This should do what you need:
most_frequent[int_char].key = int_char;
most_frequent[int_char].value++;
Yes, it sets the key many times, even though it doesn't need to.

When accessing the container with the key (vector is indexed with an integer, which is "the key" in your case), you don't have to store the key in the value field of the container again.
So you don't need your struct since you only need the value field and can can store the number of occurrences directly in the vector.
The idea is to fill the vector with 256 integers in the beginning, all initialized to zero. Then, use the vector index as your "key" (character code) to access the elements (number of occurrences).
This will result in a code similar to this:
// initialize with 256 entries, one for each character:
vector<int> counts(256);
for (string::size_type i = 0; i <= characters.length(); i++) {
// for each occurrence of a character, increase the value in the vector:
int int_char = (int)characters[i];
counts[int_char]++;
}
Once filling of the vector is done, you can find the maximum value (not only the value but also the key where it is stored) using the std::max_element algorithm:
vector<int>::iterator most_frequent =
std::max_element(counts.begin(), counts.end());
// getting the character (index within the container, "key"):
std::cout << (char)(most_frequent - counts.begin());
// the number of occurrences ("value"):
std::cout << (*most_frequent);
Here is your example with the changes (only printing the most frequent character, here it is the space so you don't see it): http://ideone.com/94GfZz
You can sort this vector, however, you will loose the key of course, since the elements will move and change their indices. There is a nice trick to process statistics like that: Use a reversed (multi)map (key, value reversed):
multimap<int,int> keyForOccurrence;
for (vector<int>::iterator i = counts.begin(); i != counts.end(); ++i) {
int occurrences = *i;
int character = i - counts.begin();
keyForOccurrence.insert(std::pair<int,int>(occurrences, character));
}
Updated code: http://ideone.com/Ub5rnL
The last thing you should now sort out by yourself is how to access and process the data within this map. The fancy thing about this reversed map is that it is now automatically sorted by occurrence, since maps are sorted by key.

I find more natural to use a std::map container to store each character occurrences. The character is map's key, its occurrence count is map's value.
It's easy to scan the source string and build this map using std::map::operator[], and ++ to increase the occurrence count.
Then, you can build a second map from the above map, with key and value inverted: so this map will be sorted by occurrences, and then you can print this second map.
Note that you have to use a std::multimap as this second map, since its keys (i.e. the occurrences) can be repeated.
Sample code follows (I tested it with VS2010 SP1/VC10):
#include <stddef.h> // for size_t
#include <algorithm> // for std::transform
#include <functional> // for std::greater
#include <iostream> // for std::cout
#include <iterator> // for std::inserter
#include <map> // for std::map, std::multimap
#include <ostream> // for std::endl
#include <string> // for std::string
#include <utility> // for std::pair
using namespace std;
int main()
{
string str = "aasa asdfs dfh f ukjyhkh k wse f sdf sdfsdf";
// Build the occurrences map (char -> occurrences)
map<char, size_t> freq;
for (size_t i = 0; i < str.length(); ++i)
freq[ str[i] ]++;
// Build a new map from previous map with inverted <key, value> pairs,
// so this new map will be sorted by old map's value (i.e. char's
// occurrences), which is new map's key.
// Use the std::greater comparator to sort in descending order.
multimap<size_t, char, greater<size_t>> sorted_freq;
transform(
freq.begin(), freq.end(), // source
inserter(sorted_freq, sorted_freq.begin()), // destination
[](const pair<char, size_t>& p) // invert key<->value
{
return pair<size_t, char>(p.second, p.first);
}
);
// Print results
for (auto it = sorted_freq.begin(); it != sorted_freq.end(); ++it)
cout << it->second << ": " << it->first << endl;
}
Output:
: 9
s: 7
f: 7
d: 5
a: 4
k: 3
h: 3
u: 1
w: 1
y: 1
j: 1
e: 1
If you don't want to print the space character occurrences, you can easily filter that out.
Note that using std::map/std::multimap will also scale up better than std::vector for non-ASCII characters, e.g. if you use Unicode UTF-32 (since Unicode characters are much more than just 256).

How to search in a given interval of a vector container

I want to search in a given interval (not a whole vector) of a vector container to see whether a user-specified integer is present or not. But I couldn't make it work. Thank you very much.
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main ()
{
int myints[] = { 10, 20, 30 ,40 };
vector<int> myvector (myints,myints+4);
vector<int>::iterator it, itLower, itUpper;
itLower = myvector.begin();
itUpper = myvector.begin();
advance(itLower, 1);
advance(itUpper, 2);
// iterator to vector element:
it = find (itLower, itUpper, 50);
if (it != myvector.end())
{
// Found
cout << "Found it";
}
else
{
// Not fount
cout << "Not found it";
}
return 0;
}

You may just replace the line
if (it != myvector.end()) ...
by
if (it != itUpper) ...
If find does not find the item it returns the last element (which is in your case the end of your range and not of the complete vector). Note that the range is defined with itUpper exclusive.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Counting Duplicates in C++ - multiset? - c++

Related

Prevent memory allocation in recursive combination generation

How to use find_first_not_of with a vector of string?

Using Functor to make a custom sort method in sets

How to use vector and struct?

How to search in a given interval of a vector container

Categories

Resources