Efficient way to re-order a C++ map-based collection - c++

I have a large(ish - >100K) collection mapping a user identifier (an int) to the count of different products that they've bought (also an int.) I need to re-organise the data as efficiently as possible to find how many users have different numbers of products. So for example, how many users have 1 product, how many users have two products etc.
I have acheived this by reversing the original data from a std::map into a std::multimap (where the key and value are simply reversed.) I can then pick out the number of users having N products using count(N) (although I also uniquely stored the values in a set so I could be sure of the exact number of values I was iterating over and their order)
Code looks like this:
// uc is a std::map<int, int> containing the original
// mapping of user identifier to the count of different
// products that they've bought.
std::set<int> uniqueCounts;
std::multimap<int, int> cu; // This maps count to user.
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
cu.insert( std::pair<int, int>( it->second, it->first ) );
uniqueCounts.insert( it->second );
}
// Now write this out
for ( std::set<int>::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it )
{
std::cout << "==> There are "
<< cu.count( *it ) << " users that have bought "
<< *it << " products(s)" << std::endl;
}
I just can't help feeling that this is not the most efficient way of doing this. Anyone know of a clever method of doing this?
I'm limited in that I can't use Boost or C++11 to do this.
Oh, also, in case anyone is wondering, this is neither homework, nor an interview question.

Assuming you know the maximum number of products that a single user could have bought, you might see better performance just using a vector to store the results of the operation. As it is you're going to need an allocation for pretty much every entry in the original map, which likely isn't the fastest option.
It would also cut down on the lookup overhead on a map, gain the benefits of memory locality, and replace the call to count on the multimap (which is not a constant time operation) with a constant time lookup of the vector.
So you could do something like this:
std::vector< int > uniqueCounts( MAX_PRODUCTS_PER_USER );
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
uniqueCounts[ uc.second ]++;
}
// Now write this out
for ( int i = 0, std::vector< int >::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it, ++i )
{
std::cout << "==> There are "
<< *it << " users that have bought "
<< i << " products(s)" << std::endl;
}
Even if you don't know the maximum number of products, it seems like you could just guess a maximum and adapt this code to increase the size of the vector if required. It's sure to result in less allocations than your original example anyway.
All this is assuming that you don't actually require the user ids after you've processed this data of course (and as pointed out in the comments below, that the number of products bought for each user is a relatively small & contiguous set. Otherwise you might be better off using a map in place of a vector - you'll still avoid calling the multimap::count function, but potentially lose some of the other benefits)

It depends on what you mean by "more efficient". First off, is this really a bottle neck? Sure, 100k entries is a lot, but if you only have to this every few minutes, it's ok if the algorithm takes a couple seconds.
The only area for improvement I see is memory usage. If this is a concern, you can skip the generation of the multimap and just keep a counter map around, something like this (beware, my C++ is a little rusty):
std::map<int, int> countFrequency; // count => how many customers with that count
for ( std::map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
// If it->second is not yet in countFrequency,
// the default constructor initializes it to 0.
countFrequency[it->second] += 1;
}
// Now write this out
for ( std::map<int, int>::const_iterator it = countFrequency.begin();
it != countFrequency.end(); ++it )
{
std::cout << "==> There are "
<< it->second << " users that have bought "
<< it->first << " products(s)" << std::endl;
}
If a user is added and buys count items, you can update countFrequency with
countFrequency[count] += 1;
If an existing user goes from oldCount to newCount items, you can update countFrequency with
countFrequency[oldCount] -= 1;
countFrequency[newCount] += 1;
Now, just as an aside, I recommend using an unsigned int for count (unless there's a legitimate reason for negative counts) and typedef'ing a userID type, for added readability.

If you can, I would recommend keeping both pieces of data current all the time. In other words, I would maintain a second map which is mapping number of products bought to number of customers who bought that many products. This map contains the exact answer to your question if you maintain it. Each time a customer buys a product, let n be the number of products this customer has now bought. Subtract one from the value at key n-1. Add one to the value at key n. If the range of keys is small enough this could be an array instead of a map. Do you ever expect a single customer to buy hundreds of products?

Just for larks, here's a mixed approach that uses a vector if the data is smallish, and a map to cover the case where one user has bought a truly absurd number of products. I doubt you'll really need the latter in a store app, but a more general version of the problem might benefit from it.
typedef std::map<int, int> Map;
typedef Map::const_iterator It;
template <typename Container>
void get_counts(const Map &source, Container &dest) {
for (It it = source.begin(); it != source.end(); ++it) {
++dest[it->second];
}
}
template <typename Container>
void print_counts(Container &people, int max_count) {
for (int i = 0; i <= max_count; ++i) {
if contains(people, i) {
std::cout << "==> There are "
<< people[i] << " users that have bought "
<< i << " products(s)" << std::endl;
}
}
}
// As an alternative to this overloaded contains(), you could write
// an overloaded print_counts -- after all the one above is not an
// efficient way to iterate a sparsely-populated map.
// Or you might prefer a template function that visits
// each entry in the container, calling a specified functor to
// will print the output, and passing it the key and value.
// This is just the smallest point of customization I thought of.
bool contains(const Map &c, int key) {
return c.count(key);
}
bool contains(const std::vector<int, int> &c, int key) {
// also check 0 < key < c.size() for a more general-purpose function
return c[key];
}
void do_everything(const Map &uc) {
// first get the max product count
int max_count = 0;
for (It it = uc.begin(); it != uc.end(); ++it) {
max_count = max(max_count, it->second);
}
if (max_count > uc.size()) { // or some other threshold
Map counts;
get_counts(uc, counts);
print_counts(counts, max_count);
} else {
std::vector<int> counts(max_count+1);
get_counts(uc, counts);
print_counts(counts, max_count);
}
}
From here you could refactor, to create a class template CountReOrderer, which takes a template parameter telling it whether to use a vector or a map for the counts.

Related

std::map iterate through keys with index of key

I need to iterate through the keys of a map, but looking ahead to future keys. For example:
map<int, int> m;
vector<int> v;
for(map<int,int>::iterator it = m.begin(); it != m.end(); ++it) {
cout << it->first << "\n";
//is the next element equal to 3?
auto next = it++;
std::cout << "equals 3" << next==3 << std::endl
}
but sometimes I don't want to see the next element (n+1), maybe I want to see the n+10 element, etc. How do I do this? If my list has 100 elements, and I arrive at element 99, then 99+10 is gonna break evrything. Is there a way to test if my iterator can achieve n+10?
The best solution I thougth of is to keep track of an index i and see if I can call it + 10 (that is, if i+10<mapSize). Bus is there a more elegant way? Maybe testing if the n+10 iterator exists or something?
Map does not sound like the appropiate data type for your use case. Try switching to a container that supports random access
I think that your are looking for something like std::advance (Please see here), but with an additional check, if the advance operation was past the end or not.
We can use a small lambda to do this kind of check. Since it uses only an increment operation, it should work for all type of containers.
Please see the following example to illustrate the function:
#include <iostream>
#include <map>
#include <iterator>
using Type = std::map<int, int>;
using TypeIter = Type::iterator;
int main() {
// Lambda to advance a container iterator and check, if that was possible
auto advanceAndCheck = [](const Type& t, const TypeIter& ti, size_t advance) -> std::pair<bool, TypeIter>
{ TypeIter i{ ti }; while ((i != t.end()) && (advance--)) ++i; return { i != t.end(), i }; };
// Test data
Type m{ {1,1}, {2,2}, {3,3}, {4,4}, {5,5} , {6,6} };
// Iterate over container
for (TypeIter it = m.begin(); it != m.end(); ++it) {
// Show some values
std::cout << it->first << "\n";
// Test
{
// Advance and check
auto [OK, itn] = advanceAndCheck(m, it, 1);
if (OK && itn->first == 3) std::cout << "The next Element is 3\n";
}
{
// Advance and check
auto [OK, itn] = advanceAndCheck(m, it, 5);
if (OK && itn->first == 6) std::cout << "The 5th next Element is 6\n";
}
}
}

Sorting of two vectors separately?

I have to make a program which uses the following two vectors:-
vector<double> age;
vector<string> name;
I take their input separately. I have to make a function sort() such that it sorts name alphabetically and then reorganizes age accordingly to match name.
Please help!!
If you can group them within struct or equivalent, you may create an additional vector for indexes that you sort and use for indirection:
std::vector<double> ages = /**/;
std::vector<string> names = /**/;
// ages.size() == names.size()
std::vector<std::size_t> indexes(names.size());
std::iota(indexes.begin(), indexes.end(), 0u);
std::sort(indexes.begin(), indexes.end(), [&](std::size_t lhs, std::size_t rhs) {
return names[lhs] < names[rhs];
});
for (auto index : indexes) {
std::cout << names[index] << " has " << ages[index] << std::endl;
}
And with range-v3 you can do:
std::vector<double> ages = /**/;
std::vector<string> names = /**/;
auto zip = ranges::view::zip(names, ages);
ranges::sort(zip);
for (const auto& z : zip) {
std::cout << std::get<0>(z) << " " << std::get<1>(z) << std::endl;
}
Demo
If the sort function accepts both the vectors, the easiest way is to copy everything to std::set<std::pair<string,double>> which sorts first on name and then copy the sorted entries to the input vectors. If you can't use sets, you can use vector and sort yourself.
The reason is that sorting changes the order so you lose the link between the entries of both vectors. If you can't or won't use the combined set method, you need to make sure that the link is maintained in another way, probably via a temporary container with references.
Assuming you really need a function that takes two vectors and modifies them.
The sort function can be implemented as:
void sort ( vector<double>& ages, vector<string>& names)
{
if ( ages.size() != names.size() )
return;
std::map< string, double > helper_map;
for ( size_t id = 0; id < names.size(); ++id)
{
helper_map.emplace( names[id], ages[id] );
}
names.clear();
ages.clear();
for (const auto& helper : helper_map)
{
names.push_back( helper.first );
ages.push_back( helper.second );
}
}
Working example:
http://coliru.stacked-crooked.com/a/2457c832c0b612b2
However keep in mind that this problem should be solved using different approaches as pointed out in the comments. As homework those things don't always apply though.

How to count how the number of elements stored in a vector which is stored in a map

I have four static vectors. In my .cpp file (not my .h file!) I define these vector as such:
std::vector<Object*> ClassA::vecA;
std::vector<Object*> ClassA::vecB;
std::vector<Object*> ClassA::vecC;
std::vector<Object*> ClassA::vecD;
Then I populate each of these vectors with a number of objects of type Object.
Next I create a map:
std::map<std::string, std::vector<Object*> > cntr;
I populate this map with the vectors from above and a string as a Key for each vector.
The question is, how do I access the vectors in the map to find out the number of elements they contain. I have tried:
for (it = Cntr.begin(); it != Cntr.end(); it++)
{
if (it->first != token)
{
std::cout << it->first << std::endl;
int i = (it->second).size();
std::cout << "SIZE: " << i << std::endl;
}
}
However i always gives me the value of 1. What is the correct approach?
First off you need to set the iterator to point to an valid element of the map. When you do
std::map<std::string, std::vector<Object*>>::iterator Class::it;
int size = it->second.size();
it doesn't point to anything so using it is undefined behavior. What you can do though is use
std::map<std::string, std::vector<Object*>>::iterator Class::it;
it = cntr.begin();
int size = it->second.size();
Which now gives you the size of the first vector in the map.
If you want to get all of the sizes then you will need to iterate through the map. You can do this with a nice ranged based for loop like
for (const auto & elem : cntr) // get a const reference to each pair
std::cout << elem.second.size();
NathanOliver's answer should work if you have C++11. If you don't, you can try this, with a typedef to make the code clear:
typedef std::vector<Object*> MypObjVec;
typedef std::map<std::string, MypObjVec> MyMap;
MyMap::iterator Class::it = cntr.begin();
const MyMap::iterator Class::it_end = cntr.end();
for(; it!=it_end ; ++it)
{
std::cout<< it->second.size() << std::endl;
}

Iterating Multiple Multimaps

I'am having problems while trying to iterate some maps.
Basically i have a Deposit class. Each deposit class has a multimap containing a destination Deposit and a distance. (This will be used to create a graph).
When i try to iterate all the maps i'm getting a segmentation fault error.
Here's the code:
for (int j = 0; j < deposit.size(); j++) {
for (typename multimap< Deposit<Product>*, int>::iterator it = deposit.at(j)->getConnections().begin(); it != deposit.at(j)->getConnections().end(); it++) {
cout << "From the depo. " << deposit.at(j)->getKey() << " to " << it->first->getKey() << " with the distance " << it->second << endl;
}
}
EDIT:
Deposit Class:
template<class Product>
class Deposit {
private:
multimap <Deposit<Product>*, int> connections;
public:
void addConnection(Deposit<Product>* dep, int dist);
multimap <Deposit<Product>*, int> getConnections() const;
};
(...)
template<class Product>
void Deposit<Product> ::addConnection(Deposit<Product>* depKey, int dist) {
this->connections.insert(pair<Deposit<Product>*, int>(depKey, dist));
}
template<class Product>
multimap < Deposit<Product>*, int> Deposit<Product> ::getConnections() const {
return this->connections;
}
Storage Class - This is where I populate the multimaps.
(...)
ligs = rand() % 10;
do{
ligIdx = rand() % deposit.size();
dist = rand() % 100;
deposit.at(i)->addConnection(deposit.at(ligIdx), dist);
ligs--;
}while(ligs>0);
(...)
My deposit class has 2 subclasses. I dont know why the error occurs. Is there any problem with the iterator?
Thank you very much!!!
The problem you have is pretty nasty: getConnections() returns a multimap by value.
This means that successive calls to deposit.at(j)->getConnections() refer to different temporary copies of the original multimap. Thus the the iterator created on the begin of the first temporary copy, will never match the end of the second copy, without first accessing illegally some invalid places.
Two alternatives:
if you want to iterate on a copy, make one local copy auto cnx = deposit.at(j)->getConnections(); and change your inner loop to iterate on cnx.
if you intended to iterate on the original multimap, change the signature of getConnections() to return a reference.
By the way, if you use c++11 or higher, you could consider defining the iterator in a more readable way: for (auto it = ....) or even better, using the range-for syntax as proposed by Norah Attkins in her answer.
If you have a c++11 (or 14) compiler (and you should - unless it's a work/company barrier involved) consider using range based for loops to make your code clearer
for (auto const& elem : deposit)
{
for (auto const& product : elem)
{
}
}
Apart from the stylist guidance, lacking info on what the containers actrually hold, we'd just be guessing what's wrong when answering this question. My guess is that invalid reads happen and the pointers you're accessing are not allocated (but that's a guess)

to compare sub-items in vector

Overview of problem : I am using std::vector to hold objects of Subject. Now this vector contains lots of objects( with lots I mean 10-20 objects at max) .
These objects have string member values like category and sub_category.
Both category and sub_category can have string which can be same of other objects's sub_category & category.
Issue: Now I want my std::vector to have only those objects whose's sub_category are unique. If category is not unique that's not a problem .
Secondly if we found 2 objects having same sub_category then we have to delete one of them from the vector. we will delete it based on some rules example
Rules for deleting are if
i) instance of Subject ->category = " Land " OR if category = "Jungle" then delete other duplicate object ,
ii) if above condition doesn't match then delete either of them.
I am wondering , how would I compare the sub-items from the vector . For example
I have class say Subject
class Subject
{
public :
// some constructors,
// functions to get ., set category and sub category
std::String get_sub_category()
std::string get_category();
private:
std::string category;
std::string sub_category;
}
I have vector which stores object of Subjects. example
vector<Subject> sub_vec;
Now what I want is to delete the object from vector that has same sub_category
I am not looking for source code buT i need a starting point,?
example
say
sub_vec[0] = Animal object that has sub_category Tiger
sub_vec [1] = Animal object with Lion as sub category
sub_vec[2] = Forest object with sub_category Tiger
so what I want is to based on some conditions(which I can do ) remove either Forest or Animal object containing Tiger.
But for that how would I do comparison?
Thanks everyone for the help. I have written the function and have checked it but I am sure there is a room for hell lot of improvement. May you guys please pin out out my pitfalls.
std::vector< Subject >copy_vector; // copy_vector conatins all the objects of SUbject with redundant sub_category
for( std::vector< Subject >::iterator ii = copy_vector.begin() ; ii != copy_vector.end() ; ++ii )
{
sub_category = ii->get_sub_category();
std::cout <<" sub_category-- in main for loop " << sub_category << std::endl ;
std::vector< Subject >::iterator it = ii+1;
while( it != copy_vector.end() )
{
std::cout <<" the size of copy _vector is = " << copy_vector.size() << std::endl ; // for debug purpose
if( it->get_sub_category() == sub_category )
{
std::cout <<" we got a match here" << std::endl ;
// since both are duplicate , we have to delete one of them. Rules for deleting are if
i) instance of Subject ->category = " Land " OR if category = "Jungle" then delete other duplicate object ,
ii) if above condition doesn't match then delete either of them.
if( ( it->get_category == "Land" ) || ( it->get_category == "Jungle" ) )
{
std::cout <<" we are deleting it reference value " << std::endl ;
it = copy_vector.erase(ii);
// increment the counter
++ii;
}
else if( ( ii->get_category == "Land" ) || ( ii->get_category == "Jungle" ) )
{
std::cout <<" we are deleting from copy_vector " << std::endl ;
it = copy_vector.erase(it);
}
else
{
std::cout <<" we are deleting from copy_vector when there is no match for rules " << std::endl ;
it = copy_vector.erase(it);
}
std::cout <<" the size of copy _vector is = " << copy_vector.size() << std::endl ;
}
else
{
std::cout <<" No Match" << std::endl;
// increase main iterator
if( it != copy_vector.end() )
{
++it;
}
}
}
}
//print value
for( std::vector< Subject >::iterator ii = copy_vector.begin() ; ii != copy_vector.end() ; ++ii )
{
std::cout <<" New list = " << ii->get_category <<" \t " << ii->get_sub_category() << std::endl;
}
One way to do it is by using remove_if. To check if an object has a duplicate sub_category you can use a function or functor that stores the subcategories it finds in a set or an unordered_map and the remove all objects where its sub_category already exists in the set/unordered_map.
Note, unordered_map is only available in c++11.
Your solution has time complexity O(n*n) but the problem can be solved with complexity O(n*log(n)) or even O(n).
First, let's define such category comparison function (if a category is "Land" or "Jungle" then it's greater then other categories):
bool CategoryLess(string sCategory1, string sCategory2){
return sCategory1 != "Land" && sCategory1 != "Jungle"
&& (sCategory2 == "Land" || sCategory2 == "Jungle");
}
Now iterate through the vector and store all found subcategories and corresponding Subjects in a std::unordered_map (or std::map in if you don't use C++11). If the subcategory is already in the map then replace corresponding Subject if the category of the already found Subject less then category of the new Subject:
unordered_map<string, Subject*> Subcategories;
for (int i=0; i<sub_vec.size(); ++i){
unordered_map<string, Subject*>::iterator
it = Subcategories.find(sub_vec[i].get_sub_category());
if (it != Subcategories.end()){
if (CategoryLess((*it)->get_category(), sub_vec[i].get_category())
it->second = &sub_vec[i];
}
else
Subcategories[sub_vec[i].get_sub_category()] = &sub_vec[i];
}
Now you have the map of all subcategories and corresponding Subjects.
If we found two or more Subjects with the same subcategory then the map contains a pointer to the Subject with greater category.
Now iterate sub_vec once more and delete Subjects if
Subcategories[sub_vec[i].get_sub_category()] != &sub_vec[i];
Time complexity:
If we use std::unordered_map then expected time complexity is O(n) for the both cycles (O(n*n) in worst case).
If we use std::map then time complexity is O(n*log(n)) for the both cycles.
(I didn't take into account time complexities of string comparison and vector.erase as irrelevant)
Please note than when you delete a Subject from the vector, the addresses of other Subjects can be changed. So you need to take care when compare pointers to Subjects (for example copy needed Subjects to another vector instead of deleting other Subjects from the vector). But it doesn't change the general idea of my solution.
You could try to use BOOST_FOREACH to iterate thru vector elements
I'm doing something similar like this :
BOOST_FOREACH( Subject f, sub_vec )
{
///TODO: do your filtering here
if(f.sub_category == "<bla bla>")
}
What I like about using BOOST_FOREACH is that it makes very readable code and when you are dealing with many vector elements and many filtering possibilities, then that is certainly a factor to consider
Either you should use a lambda expression or define a functional object.
An example with using a lambda expression
#include <vector>
#include <string>
#include <algorithm>
// ...
std:string tiger = "Tiger";
sub_vec.erase( std::remove_if( sub_vec.begin(), sub_vec.end(),
[&]( const Subject &s ) { return ( s.sub_category == tiger ); } ),
sub_vec.end() );
Take into account that the code above removes all obexts that have sub_category equal to "Tiger". If you need to remove only duplicates then at first you should find the first object of the sub category and then remove all other objects with the same subcategory. In this case the code could look as
#include <vector>
#include <string>
#include <algorithm>
// ...
std:string tiger = "Tiger";
auto equal_sb_category = [&]( const Subject &s ) { return ( s.sub_category == tiger ); };
auto it = std::find_if( sub_vec.begin(), sub_vec.end(), equal_sb_category );
if ( it != sub_vec.end() )
{
sub_vec.erase( std::remove_if( std::next( it ), sub_vec.end(), equal_sb_category ),
sub_vec.end() );
}