Get distinct vector of vector with count - c++

What is the best way to return only the unique element with the counts from vector of vectors?
std::vector<std::vector<string>> vec_vec{{a,a,b,c},{a,c,c}};
The results should be :
{a, b, c} // This is the vector that contains the unique items.
{3, 1, 3} //a exists three times, b only one time, and c is three times.
To solve this I use the following:
1- Copy all the items in the vector of vector to single vector, so the output will be:
vec_vec{{a,a,b,c},{a,c,c}} -> vec{a,a,b,c,a,c,c}
2- Now I'm dealing with a single vector (not vector of vector), so it's much easier to sort, get the unique items and them (I may use the code here1 and here2)
Is converting the vector of vector to one vector is a good idea? Any better solution?
Can we find better way with less complexity comparing with the current way (c++11, c++14)?

From the top of my mind:
std::unordered_map<std::string, std::size_t> counters;
for(auto const& inner : vec_vec)
for(auto const& v : inner)
counters[v]++;
for(auto const& cnt : counters)
std::cout << cnt.first << " appears " << cnt.second << std::endl;

Use hash maps.
std::unordered_map<string, int> result;
for (const auto& x : vec_vec)
for (const string& y : x)
result[y]++;

I would just use a map as "tally" structure:
std::map<string, unsigned int> tally;
for(auto subvector : vector) { // subvector is std::vector<std::string>
for(auto item : subvector) { // item is a std::string
++tally[item];
}
}
If you insist on having the result as two parallel vectors (but why would you?) simply construct them from the map:
std::vector<std::string> unique_items;
unique_items.reserve(tally.size());
std::vector<unsigned int> counts;
counts.reserve(tally.size());
for(auto item : tally) {
unique_items.push_back(item.first);
counts.push_back(item.second);
}
If you don't want the result vector to be sorted, you can use an unordered_map, as suggested in other answers.

Related

How to get the top 100 names in a collection

I am creating a program where I am summarizing from a data file. The data file has information about first names, etc. The information are the fields in the csv file. The fields in the data file are included as instance variables in the class. I created setter and getter methods to return the data for one person. I created vectors to hold the collection of variables.
I am having trouble understanding how create a list of the 100 most common first names of all people in the collection. The list must be in descending order of occurrence.
I was able to print all the common names and its frequencies. But, I am unable to print the 100 most common names. I sorted the vector and got the following errors:
class std::pair<const std::string, int> has no member begin and end
Please help me resolve these issue. All processing of data in the vector must be done with iterators.I am not sure how to fix these issues since I am a beginner.
std::vector<std::string> commonNamesFirst; //vector
for (auto x : census) {
commonNamesFirst.push_back(x.getFirstName()); //populate vector
}
std::map<std::string, int> frequencies;
for (auto& x : census) { ++frequencies[x.getFirstName()]; }
for (auto& freq : frequencies) {
sort(freq.begin(), freq.end(), greater <>()); //error, need to sort in descending order
cout << freq.first << ": " << freq.second << endl; //print the 100 common names in descending order
}
std::map<std::string, int> frequencies;
This is generally the right direction. You're using this to count how many times each word occurs.
for (auto& freq : frequencies) {
This iterates over each individual word and a count of how many times it occured. This no longer makes any logical sense. You are looking to find the 100 most common ones, the one with the highest count values. Iterating, and looking at each one individually, in the manner that's done here, does not make any sense.
sort(freq.begin(), freq.end(), greater <>());
freq, here, is a single word and how many times it occured. You are using freq to iterate over all of the frequencies. Therefore, this is just one of the words, and its frequency value. This is a single std::pair value. And it does not have anything called begin, or end. And that's what your compiler is telling you, directly.
Furthermore, you cannot sort a std::map in the first place. This is not a sortable container. The simplest option is to extract the contents if the now-complete map into something that's sortable. Like, for example, a vector:
std::vector<std::pair<std::string, int>> vfrequencies{
frequencies.begin(),
frequencies.end()
};
So, you've now copied the contents of a map into a vector. Not the most efficient approach, but a workable one.
And now, you can sort this vector. Rather easily.
However, as one last detail, you can't just drop std::greater<> and expect the right thing to happen.
You are looking to sort on the frequency count value only, which is the .second of these std::pairs. A plain std::greater is not going to do this for you. The std::greater overload for a std::pair is not going to do what you think it will do, here.
You will need to provide your own custom lambda for the third parameter of std::sort, that compares the second value of the std::pairs in that vector.
And then, the first 100 most common words will be the first 100 values in the vector. Mission accomplished.
You cannot (re-)sort std::map, you can copy frequencies in vector or std::multimap as intermediate:
std::map<std::string, int> frequencies;
for (auto& x : census) { ++frequencies[x.getFirstName()]; }
std::vector<std::pair<std::string, int>> freqs{frequencies.begin(), frequencies.end()};
std::partial_sort(freqs.begin(), freqs.begin() + 100, freqs.end(),
[](const auto& lhs, const auto& rhs){ return lhs.second > rhs.second; });
for (std::size_t i = 0; i != 100; ++i)
{
std::cout << freqs[i].second << ":" << freqs[i] << std::endl;
}
Building on to #MichaƂ Kaczorowski's answer, you are trying to sort the values in each pair instead of the pairs in the map. However, as Sam mentoined, you cannot sort an std::map (the internal implementation stores things sorted by the key value, or the name in this case). You'd have to get the values out of the map and sort them then, or use a priority queue and heapsort (faster constant factor), or a monotonic queue (linear time but harder to implement). Here is an example heapsort implementation:
vector<string> commonNamesFirst; //vector
for (auto x : census) {
commonNamesFirst.push_back(x.getFirstName()); //populate vector
}
std::map<std::string, int> frequencies;
for (auto& x : census) { ++frequencies[x.getFirstName()]; }
std::priority_queue<pair<int, std::string> > top_names; // put the frequency before the name to take advantage of default pair compare
for (auto& freq : frequencies) top_names.push(std::make_pair(freq.second, freq.first));
for (int i=0; i<100; ++i)
{
outputFile << top_names.top().second << ": " << top_names.top().first << endl; //print the 100 common names in descending order
top_names.pop();
}
The error you get, says it all. You are trying to sort individual std::pair. I think the best way would be to transform your map into a std::vector of pairs and then sort that vector. Then just go through first 100 elements in a loop and print results.

how to traverse in a unordered_map of unordered_map of unordered_map in c++

I wanted to traverse inside a data structure - unordered_map<int, unordered_map<int, unordered_map<int, int>>> myMap. To further specify I want to get the data elements like ->
myMap[someVal1][someVal2]
{all second elements of this unordered map}
I am aware of the fact that the same could by done by a 3d array however using a 3d array would not be efficient as the data range is huge and the program would end up using far more space than required.I tried using some iterators like unordered_map<int, unordered_map<int, unordered_map<int, int>>>::iterator i, and several other such iterators however it always ends up in some error or the other. Could someone help me in understanding how this map can be traversed ? Thanks in advance!
You could traverse the map with a foreach loop (it needs C++11, I think that won't be a problem), if you don't want to use iterators.
myMap mapMapMap;
for(auto& mapMap : mapMapMap){
for(auto& map : mapMap.second){
for(auto& key_value : map.second){
int key = key_value.first;
int value = key_value.second;
// ....
}
}
}
Also, if you didn't want to iterate all the map, but only the values of the third level, given the two first, then this should make it:
int k1, k2;
for(auto& key_value : myMap.at(k1).at(k2)){
//...
}

Sorting a vector by unordered map of the elements pointers as keys

I have a vector of elements std::vector<T> my_vec. At some point in my code I assign a score for each element of the vector using an unordered map. After that, I would like to sort the vector by the scores of its elements with the minimum code possible.
I came up with this solution, define the map as follows: std::unordered_map<const T*, float> scores_map. For score assignment, insert the score to the map as follows:
for (const auto& el : my_vec)
scores_map[&el] = calc_score(el);
Then I sort using:
std::sort(my_vec.begin(), my_vec.end(),
[&my_map](const auto& a, const auto& b){return my_map[&a] > my_map[&b];});
Is this considered a bug-free and good practice, if not any idea how to make it so?
#fas wrote in a comment:
Elements in vector are moved during sort, so their pointers also change and scores_map becomes invalid, isn't it?
That is correct. You should not use pointers as keys in scores_map.
Option 1
If the vector contains unique items, you may use the T as the key type.
for (const auto& el : my_vec)
scores_map[el] = calc_score(el);
Then sort using:
std::sort(my_vec.begin(), my_vec.end(),
[&my_map](const auto& a, const auto& b){return my_map[a] > my_map[b];});
Option 2
If the vector does not contain unique elements, you may use the following strategy.
Use indices as the key of my_map.
Create a helper std::vector<size_t> object that contains just indices.
Sort the vector of indices.
Use the sorted indices vector to fetch the elements from my_vec.
for (size_t i = 0; i < my_vec.size(); ++i )
scores_map[i] = calc_score(my_vec[i]);
// Create the vector of indices
std::vector<size_t> indices_vec(my_vec.size());
for ( size_t i = 0; i < indices_vec.size(); ++i )
{
indices_vec[i] = i;
}
// Sort the vector of indices
std::sort(indices_vec.begin(), indices_vec.end(),
[&my_map](size_t a, size_t b){return my_map[a] > my_map[b];});
for (auto index : indices_vec)
{
// Use my_vec[index]
}
No, this not bug-free. std::sort will change the addresses of the elements.
You could store the score with each element in a pair:
std::pair<float, T>
and sort the vector
std::vector<std::pair<float, T> > my_vec
with
std::sort(my_vec.begin(), my_vec.end(),
[](const auto& a, const auto& b){return a.first > b.first;});

Check for common members in vector c++

What is the best way to verify if there are common members within multiple vectors?
The vectors aren't necessarily of equal size and they may contain custom data (such as structures containing two integers that represent a 2D coordinate).
For example:
vec1 = {(1,2); (3,1); (2,2)};
vec2 = {(3,4); (1,2)};
How to verify that both vectors have a common member?
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
For non-trivial data sets, the most efficient method is probably to sort both vectors, and then use std::set_intersection function defined in , like follows:
#include <vector>
#include <algorithm>
using namespace std;
typedef vector<pair<int, int>> tPointVector;
tPointVector vec1 {{1,2}, {3,1}, {2,2}};
tPointVector vec2 {{3,4}, {1,2}};
std::sort(begin(vec1), end(vec1));
std::sort(begin(vec2), end(vec2));
tPointVector vec3;
vec3.reserve(std::min(vec1.size(), vec2.size()));
set_intersection(begin(vec1), end(vec1), begin(vec2), end(vec2), back_inserter(vec3));
You may get better performance with a nonstandard algorithm if you do not need to know which elements are different, but only the number of common elements, because then you can avoid having to create new copies of the common elements.
In any case, it seems to me that starting by sorting both containers will give you the best performance for data sets with more than a few dozen elements.
Here's an attempt at writing an algorithm that just gives you the count of matching elements (untested):
auto it1 = begin(vec1);
auto it2 = begin(vec2);
const auto end1 = end(vec1);
const auto end2 = end(vec2);
sort(it1, end1);
sort(it2, end2);
size_t numCommonElements = 0;
while (it1 != end1 && it2 != end2) {
bool oneIsSmaller = *it1 < *it2;
if (oneIsSmaller) {
it1 = lower_bound(it1, end1, *it2);
} else {
bool twoIsSmaller = *it2 < *it1;
if (twoIsSmaller) {
it2 = lower_bound(it2, end2, *it1);
} else {
// none of the elements is smaller than the other
// so it's a match
++it1;
++it2;
++numCommonElements;
}
}
}
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
You need to go through all elements at least once, I assume you're implying you don't want to check every combinations. Indeed you don't want to do :
for all elements in vec1, go through the entire vec2 to check if the element is here. This won't be efficient if your vectors have a big number of elements.
If you prefer a linear time solution and you don't mind using extra memory here is what you can do :
You need a hashing function to insert element in an unordered_map or unordered_set
See https://stackoverflow.com/a/13486174/2502814
// next_permutation example
#include <iostream> // std::cout
#include <unordered_set> // std::unordered_set
#include <vector> // std::vector
using namespace std;
namespace std {
template <>
struct hash<pair<int, int>>
{
typedef pair<int, int> argument_type;
typedef std::size_t result_type;
result_type operator()(const pair<int, int> & t) const
{
std::hash<int> int_hash;
return int_hash(t.first + 6495227 * t.second);
}
};
}
int main () {
vector<pair<int, int>> vec1 {{1,2}, {3,1}, {2,2}};
vector<pair<int, int>> vec2 {{3,4}, {1,2}};
// Copy all elements from vec2 into an unordered_set
unordered_set<pair<int, int>> in_vec2;
in_vec2.insert(vec2.begin(),vec2.end());
// Traverse vec1 and check if elements are here
for (auto& e : vec1)
{
if(in_vec2.find(e) != in_vec2.end()) // Searching in an unordered_set is faster than going through all elements of vec2 when vec2 is big.
{
//Here are the elements in common:
cout << "{" << e.first << "," << e.second << "} is in common!" << endl;
}
}
return 0;
}
Output : {1,2} is in common!
You can either do that, or copy all elements of vec1 into an unordered_set, and then traverse vec2.
Depending on the sizes of vec1 and vec2, one solution might be faster than the other.
Keep in mind that picking the smaller vector to insert in the unordered_set also means you will use less extra memory.
I believe you use a 2D tree to search in 2 dimenstions. An optimal algorithm to the problem you specified would fall under the class of geometric algorithms. Maybe this link is of use to you: http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/geosearch.pdf .

std::sort that also keeps track of number of unique entries at each level

Say I have a std::vector. Say the vectors contain numbers. Let's take this std::vector
1,3,5,4,3,4,5,1,6,3
std::sort<std::less<int>> will sort this into
1,1,3,3,3,4,4,5,5,6,
How would I ammend sort so that at the same time it is sorting, it also computes the quantity of numbers at the same level. So say in addition to sorting, it would also compile the following dictionary [level is also int]
std::map<level, int>
<1, 2>
<2, 3>
<3, 2>
<4, 2>
<5, 1>
<6, 1>
so there are 2 1's, 3 3's, 2 4's, and so on.
The reason I [think] I need this is because I don't want to sort the vector, THEN once again, compute the number of duplicates at each level. It seems faster to do it both in one pass?
Thank you all! bjskishore123 is the closest thing to what I was asking, but all the responses educated me. Thanks again.
As stated by #bjskishore123, you can use a map to guarantee the correct order of your set. As a bonus, you will have an optimized strucutre to search (the map, of course).
Inserting/searching in a map takes O(log(n)) time, while traversing the vector is O(n). So, the alghorithm is O(n*log(n)). Wich is the same complexity as any sort algorithm that needs to compare elements: merge sort or quick sort, for example.
Here is a sample code for you:
int tmp[] = {5,5,5,5,5,5,2,2,2,2,7,7,7,7,1,1,1,1,6,6,6,2,2,2,8,8,8,5,5};
std::vector<int> values(tmp, tmp + sizeof(tmp) / sizeof(tmp[0]));
std::map<int, int> map_values;
for_each(values.begin(), values.end(), [&](int value)
{
map_values[value]++;
});
for(std::map<int, int>::iterator it = map_values.begin(); it != map_values.end(); it++)
{
std::cout << it->first << ": " << it->second << "times";
}
Output:
1: 4times
2: 7times
5: 8times
6: 3times
7: 4times
8: 3times
I don't think you can do this in one pass. Let's say you provide your own custom comparator for sorting which somehow tries to count the duplicates.
However the only thing you can capture in the sorter is the value(maybe reference but doesn't matter) of the current two elements being compared. You have no other information because std::sort doesn't pass any thing else to the sorter.
Now the way std::sort works it will keep swapping elements until they reach the proper location in the sorted vector. That means a single member can be sent to the sorter multiple times making it impossible to count exactly. You can count how many times a certain element and all others value equal to it have been moved but not exactly how many of them are in there.
Instead of using a vector,
While storing number one by one, Use std::multiset container
It stores internally in sorted order.
While storing each number, use a map to keep track of the number of occurrences of each number.
map<int, int> m;
Each time a number is added do
m[num]++;
So, no need of another pass to calculate the number of occurrences, although you need to iterate in map to get each occurrence count.
=============================================================================
THE FOLLOWING IS AN ALTERNATE SOLUTION WHICH IS NOT RECOMMENDED .
GIVING IT AS YOU ASKED A WAY WHICH USES STD::SORT.
Below code makes use of comparison function to count the occurrences.
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
using namespace std;
struct Elem
{
int index;
int num;
};
std::map<int, int> countMap; //Count map
std::map<int, bool> visitedMap;
bool compare(Elem a, Elem b)
{
if(visitedMap[a.index] == false)
{
visitedMap[a.index] = true;
countMap[a.num]++;
}
if(visitedMap[b.index] == false)
{
visitedMap[b.index] = true;
countMap[b.num]++;
}
return a.num < b.num;
}
int main()
{
vector<Elem> v;
Elem e[5] = {{0, 10}, {1, 20}, {2, 30}, {3, 10}, {4, 20} };
for(size_t i = 0; i < 5; i++)
v.push_back(e[i]);
std::sort(v.begin(), v.end(), compare);
for(map<int, int>::iterator it = countMap.begin(); it != countMap.end(); it++)
cout<<"Element : "<<it->first<<" occurred "<<it->second<<" times"<<endl;
}
Output:
Element : 10 occurred 2 times
Element : 20 occurred 2 times
Element : 30 occurred 1 times
If you have lots of duplicates, the fastest way to accomplish this task is probably to first count duplicates using a hash map, which is O(n), and then to sort the map, which is O(m log m) where m is the number of unique values.
Something like this (in c++11):
#include <algorithm>
#include <unordered_map>
#include <utility>
#include <vector>
std::vector<std::pair<int, int>> uniqsort(const std::vector<int>& v) {
std::unordered_map<int, int> count;
for (auto& val : v) ++count[val];
std::vector<std::pair<int, int>> result(count.begin(), count.end());
std::sort(result.begin(), result.end());
return result;
}
There are lots of variations on the theme, depending on what you need, precisely. For example, perhaps you don't even need the result to be sorted; maybe it's enough to just have the count map. Or maybe you would prefer the result to be a sorted map from int to int, in which case you could just build a regular std::map, instead. (That would be O(n log m).) Or maybe you know something about the values which make them faster to sort (like the fact that they are small integers in a known range.) And so on.