Iterating Multiple Multimaps

Iterating Multiple Multimaps - c++

I'am having problems while trying to iterate some maps.
Basically i have a Deposit class. Each deposit class has a multimap containing a destination Deposit and a distance. (This will be used to create a graph).
When i try to iterate all the maps i'm getting a segmentation fault error.
Here's the code:
for (int j = 0; j < deposit.size(); j++) {
for (typename multimap< Deposit<Product>*, int>::iterator it = deposit.at(j)->getConnections().begin(); it != deposit.at(j)->getConnections().end(); it++) {
cout << "From the depo. " << deposit.at(j)->getKey() << " to " << it->first->getKey() << " with the distance " << it->second << endl;
}
}
EDIT:
Deposit Class:
template<class Product>
class Deposit {
private:
multimap <Deposit<Product>*, int> connections;
public:
void addConnection(Deposit<Product>* dep, int dist);
multimap <Deposit<Product>*, int> getConnections() const;
};
(...)
template<class Product>
void Deposit<Product> ::addConnection(Deposit<Product>* depKey, int dist) {
this->connections.insert(pair<Deposit<Product>*, int>(depKey, dist));
}
template<class Product>
multimap < Deposit<Product>*, int> Deposit<Product> ::getConnections() const {
return this->connections;
}
Storage Class - This is where I populate the multimaps.
(...)
ligs = rand() % 10;
do{
ligIdx = rand() % deposit.size();
dist = rand() % 100;
deposit.at(i)->addConnection(deposit.at(ligIdx), dist);
ligs--;
}while(ligs>0);
(...)
My deposit class has 2 subclasses. I dont know why the error occurs. Is there any problem with the iterator?
Thank you very much!!!

The problem you have is pretty nasty: getConnections() returns a multimap by value.
This means that successive calls to deposit.at(j)->getConnections() refer to different temporary copies of the original multimap. Thus the the iterator created on the begin of the first temporary copy, will never match the end of the second copy, without first accessing illegally some invalid places.
Two alternatives:
if you want to iterate on a copy, make one local copy auto cnx = deposit.at(j)->getConnections(); and change your inner loop to iterate on cnx.
if you intended to iterate on the original multimap, change the signature of getConnections() to return a reference.
By the way, if you use c++11 or higher, you could consider defining the iterator in a more readable way: for (auto it = ....) or even better, using the range-for syntax as proposed by Norah Attkins in her answer.

If you have a c++11 (or 14) compiler (and you should - unless it's a work/company barrier involved) consider using range based for loops to make your code clearer
for (auto const& elem : deposit)
{
for (auto const& product : elem)
{
}
}
Apart from the stylist guidance, lacking info on what the containers actrually hold, we'd just be guessing what's wrong when answering this question. My guess is that invalid reads happen and the pointers you're accessing are not allocated (but that's a guess)

Related

Moving through list starting at both ends and stopping at middle using iterators

I'm trying to make a while loop for some code that has two iterators, one that starts at the beginning of the list, and is incremented and another that started at the end and is decremented, wanting them to stop once the middle of the list is reached and the whole list has been covered. When doing a similar thing with vector iterators I was able to just do while (limit > first), but this doesn't work and gives a compiler error when done with list iterators. I'm working on a book problem that has a task requirement of not allocating any new memory in the code aside from the two given iterators and am having trouble figuring out how to move through the elements of the list properly.

Reverse iterators dereference to the prior element than the one they internally refer to (the "base"). (See this image for a visual explanation.)
What this means is that if you have:
auto forward = a_list.begin();
auto backward = a_list.rbegin();
And you increment forward and backward alternately, there will be a time when forward == backward.base(). You must test this after incrementing each iterator, not both otherwise they could cross each other before you test them.

You can use std::list::size:
#include <list>
using namespace std;
void handle(int v) { cout << v << endl; }
void main() {
list<int> l = { 1,2,3,4,5,6,7,8,9 };
auto f = l.cbegin();
auto r = l.crbegin();
const int stepsCount = l.size() / 2;
for (int i = 0; i < stepsCount; ++i) {
handle(*f); handle(*r);
f++; r++;
}
if (l.size() % 2) handle(*r);
}

Static variables and functions that are called once for each choice of arguments

Here is a simple C++ question.
Description of the problem:
I have a function that takes as input an integer and returns a vector of zeros with length the input. Assume that I call the function many times with the same argument. What I want to avoid is that my function creates the vector of zeroes each time it is called. I want this to happen only the first time the function is called with the given input.
How I approached it: This brought to mind static variables. I thought of creating a static vector that holds the required zero vectors of each size, but wasn't able to figure out how to implement this. As an example I want something that "looks" like [ [0], [0,0], ...].
If there is a different way to approach such a problem please feel free to share! Also, my example with vectors is a bit specialised but replies that are more generic (concerning static variables that depend on the argument) would be greatly appreciated.
Side question:
To generalise further, is it possible to define a function that is only called once for each choice of arguments?
Thanks a lot.

You can have a map of sizes and vectors, one vector for each size:
#include <vector>
#include <map>
#include <cstddef>
std::vector<int>& get_vector(std::size_t size)
{
static std::map<size_t, std::vector<int> > vectors;
std::map<size_t, std::vector<int> >::iterator iter = vectors.find(size);
if (iter == vectors.end())
{
iter = vectors.insert(std::make_pair(size, std::vector<int>(size, 0))).first;
}
return iter->second;
}

If I understand correctly what you are trying to do, I don't think you will get the benefit you are expecting.
I wrote a quick benchmark to compare the performance of repeatedly creating a vector of zeros. The first benchmark uses the standard vector constructor. The second uses a function that only creates the vector the first time and stores it in a map:
const std::vector<int>& zeros(std::size_t size) {
static std::unordered_map<size_t, std::vector<int>> vectors;
auto find = vectors.find(size);
if (find != vectors.end())
return find->second;
auto insert = vectors.emplace(size, std::vector<int>(size));
return insert.first->second;
}
std::chrono::duration<float> benchmarkUsingMap() {
int sum = 0;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i != 10'000; ++i) {
auto zeros10k = zeros(10'000);
zeros10k[5342] = 1;
sum += zeros10k[5342];
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Sum: " << sum << "\n";
return end - start;
}
std::chrono::duration<float> benchmarkWithoutUsingMap() {
int sum = 0;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i != 10'000; ++i) {
auto zeros10k = std::vector<int>(10'000);
zeros10k[5342] = 1;
sum += zeros10k[5342];
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Sum: " << sum << "\n";
return end - start;
}
int main() {
std::cout << "Benchmark without map: " << benchmarkWithoutUsingMap().count() << '\n';
std::cout << "Benchmark using map: " << benchmarkUsingMap().count() << '\n';
}
Output:
Benchmark without map: 0.0188374
Benchmark using map: 0.134966
So, in this case, just creating the vector each time was almost 10x faster. This is assuming you want to create a mutable copy of the vector of zeros.

If each vector needs to be a separate instance then you will have to have a construction for each instance. Since you will have to construct each instance you can make a simple make_int_vector function like:
std::vector<int> make_int_vector(std::size_t size, int fill = 0)
{
return std::vector(size, fill);
}
The returned vector will either be moved or be elided with copy elision

What you are asking for is a cache. The hard part is how long an entry should exist in the cache. Your current requirement seems to be an eternal cache, meaning that each entry will persist for ever. For such a simple use case, à static map is enough:
template<typename T, typename U>
T cached(T (*funct)(U arg)) {
static unordered_map<U, T> c;
if (c.count(arg) == 0) {
c[arg] = funct(arg);
}
return c[arg];
}
The above is returning a value,which will require à copy. If you want to avoid the copy, just return a reference, but then, if you change one of the vectors, the next call will return the modified value.
template<typename T, typename U>
&T cached(T (*funct)(U arg)) {
static unordered_map<U, T> c;
if (c.count(arg) == 0) {
c[arg] = funct(arg);
}
return c[arg];
}

Operator overloading in map/pair

I'm trying to understand operator overloading used in case of STL class templates, such as: map or pair.
Let me introduce you to my code:
#include <iostream>
#include <iomanip> // left, setw
#include <string>
#include <map>
#include <utility> // pair
#include <algorithm> // count if
using namespace std;
typedef pair <string, int> Emp;
typedef map <string, Emp> MAP;
class Zakr{
int min, max;
public:
Zakr(int min, int max): min(min), max(max){}
bool operator()(const pair<const string, Emp> &p) const{
int wage = p.second.second;
return (min < wage) && (wage < max);
}
};
void print (const MAP& m) {
MAP::const_iterator it, fin = m.end();
for(it = m.begin(); it != fin; it++)
cout << "Key: " << left << setw(7) << it -> first
<< "Name: " << setw(10) << it->second.first
<< "Wage: " << it->second.second << endl;
}
int main(void){
MAP emp;
MAP object;
emp["John"] = Emp("John K.", 1900);
emp["Tania"] = Emp("Tania L.", 1900);
emp["Jeremy"] = Emp("Jeremy C", 2100);
emp["Susie"] = Emp("Susie W.", 3100);
emp["Toto"] = Emp("Toto T.", 9900);
emp["Adrian"] = Emp("Adrian N.", 1600);
emp["Germy"] = Emp("Germy P.", 2600);
print(emp);
int mn = 0, mx = 2000;
int how_much = count_if(emp.begin(), emp.end(), Zakr(mn, mx));
cout << how_much << " earn from"
<< mn << " to " << mx << endl;
}
I'm struggling to understand some bits, especially one in particular, i.e.:
class Zakr{
int min, max;
public:
Zakr(int min, int max): min(min), max(max){}
bool operator()(const pair<const string, Emp> &p) const{
int wage = p.second.second;
return (min < wage) && (wage < max);
}
};
So I build class called Zakr, so that I will be able to use it to determine as a functor in count_if statement.
Am I right ?
I initialize private fields, min and max to use them in constructor and than so that operator which has been overloaded could return the boolean value based on their own values.
The most diffucult part is to understand bool operator overloading.
bool operator()(const pair<const string, Emp> &p) const{
int wage = p.second.second;
Why the hell I need to 1* make a pair of some useless string value and EMP?
Since all I'm interested in is stored in EMP, i.e.: int value which will be used in overloading is stored in Emp.
Why couldn't I just access int stored in Emp like so:
bool operator(Emp &p)
{
int wage = p.second;
return (min < wage) && (wage < max);
}
Why I do I need to make another pair like so: (const pair &p), if all I'm interested in is values stored in pair called Emp.
Why do I need to make another pair with useless first element of above named pair: string.
I'm not going to make use of it, so why it's needed to compile the code ?
I did try my best to explain my doubts as clear as its possible.
Hopefully someone will understand this rather long post.
Cheers!

This is because iterators over std::map return you a std::pair for each element. The first item in the pair is the map key, the second item is the map value. See the value_type in the documentation for std::map.
This question has some answers on how to get iterators over the map's values only.

1. Class Zakr:
Indeed, it's typically for use with count_if(). If you look at the link, you'll see count_if(first, last, pred) is equivalent to something like:
int ret = 0;
while (first!=last) {
if (pred(*first)) ++ret; // works as soon as object pred has operator() defined.
++first;
}
return ret;
2. Why is a pair needed in operator():
A map works with pairs, each made of a unique key and a corresponding value.
This is in part hidden. For example, when you use a map as associative array with expression as emp["John"], the map will find the pair having the unique key "John" and return a reference to the corresponding value.
However, as soon as you iterate through the map, your iterator will address these pairs. Why ? Because if it would just iterate through the value, you would get value, but you would never know to which unique key it corresponds.
Consequence: count_if() iterates through the map, so the predicate is called with an iterator that addresses a pair.
3. Why make a useless pair:
First, the counting function does not create a dummy pair. It uses a reference to an existing pair (from performance point of view, it's no more cost than passing a pointer !)
And, well, the map is here to address general problems. You could one day be interested to make count not only on wage, but for example also on the associated key (example: all the wages no in the range for employees with name starting with 'A').

Why does renaming my variable prevent a segfault?

I have a single depth-first search algorithm implemented to traverse my graph, given the iterator to a starting node.
Documentation summary:
GraphIter is a typedef for Graph::iterator
Graph class extends map<string, Node>
start->second.edges() returns set<string>
This code causes a segmentation fault if the size of start->second.edges() is 0:
(I've truncated the irrelevant pieces, including the recursive calls, for brevity.)
Bad Code
void Graph::dfs(GraphIter start)
{
cout << "EDGES SIZE: " << start->second.edges().size() << endl;
for (set<string>::iterator it = start->second.edges().begin();
it != start->second.edges().end(); ++it)
{
GraphIter iter = this->find(*it); // <--- SEGMENTATION FAULT
}
}
Now watch what happens when I pull start->second.edges() into a local variable: no more segfault!
Here's the code that doesn't generate a segfault:
Good Code
void Graph::dfs(GraphIter start)
{
set<string> edges = start->second.edges(); // <--- MAGIC TRICK
cout << "EDGES SIZE: " << edges.size() << endl;
for (set<string>::iterator it = edges.begin();
it != edges.end(); ++it)
{
GraphIter iter = this->find(*it);
}
}
So the difference is that in the good code, when the size of the set of strings (from the edges() method) is 0, the for loop is never entered in the second case. But in the first case, the for loop is still executed at least once until it realize that it can't dereference the it variable.
Why are these different? Don't they access the same parts of memory?

Because edges() returns a set by value, start->second.edges().begin() and start->second.edges().end() return iterators to different containers because each call to edges() results in a new set being returned.
By creating a single copy with a named variable you ensure that the iterators all come from the same container and you can validly iterator from begin() to end().

It could be that start->second.edges() returns an std::set<std::string> by value. That would render the iterators in the loop incompatible and lead to undefined behaviour.
Your "magic trick"
set<string> edges = start->second.edges();
ensures that you are iterating over the same container, "edges".
You could fix it by having Node::edges() return by reference:
const std::set<std::string>& edges() const { .... }

Efficient way to re-order a C++ map-based collection

I have a large(ish - >100K) collection mapping a user identifier (an int) to the count of different products that they've bought (also an int.) I need to re-organise the data as efficiently as possible to find how many users have different numbers of products. So for example, how many users have 1 product, how many users have two products etc.
I have acheived this by reversing the original data from a std::map into a std::multimap (where the key and value are simply reversed.) I can then pick out the number of users having N products using count(N) (although I also uniquely stored the values in a set so I could be sure of the exact number of values I was iterating over and their order)
Code looks like this:
// uc is a std::map<int, int> containing the original
// mapping of user identifier to the count of different
// products that they've bought.
std::set<int> uniqueCounts;
std::multimap<int, int> cu; // This maps count to user.
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
cu.insert( std::pair<int, int>( it->second, it->first ) );
uniqueCounts.insert( it->second );
}
// Now write this out
for ( std::set<int>::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it )
{
std::cout << "==> There are "
<< cu.count( *it ) << " users that have bought "
<< *it << " products(s)" << std::endl;
}
I just can't help feeling that this is not the most efficient way of doing this. Anyone know of a clever method of doing this?
I'm limited in that I can't use Boost or C++11 to do this.
Oh, also, in case anyone is wondering, this is neither homework, nor an interview question.

Assuming you know the maximum number of products that a single user could have bought, you might see better performance just using a vector to store the results of the operation. As it is you're going to need an allocation for pretty much every entry in the original map, which likely isn't the fastest option.
It would also cut down on the lookup overhead on a map, gain the benefits of memory locality, and replace the call to count on the multimap (which is not a constant time operation) with a constant time lookup of the vector.
So you could do something like this:
std::vector< int > uniqueCounts( MAX_PRODUCTS_PER_USER );
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
uniqueCounts[ uc.second ]++;
}
// Now write this out
for ( int i = 0, std::vector< int >::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it, ++i )
{
std::cout << "==> There are "
<< *it << " users that have bought "
<< i << " products(s)" << std::endl;
}
Even if you don't know the maximum number of products, it seems like you could just guess a maximum and adapt this code to increase the size of the vector if required. It's sure to result in less allocations than your original example anyway.
All this is assuming that you don't actually require the user ids after you've processed this data of course (and as pointed out in the comments below, that the number of products bought for each user is a relatively small & contiguous set. Otherwise you might be better off using a map in place of a vector - you'll still avoid calling the multimap::count function, but potentially lose some of the other benefits)

It depends on what you mean by "more efficient". First off, is this really a bottle neck? Sure, 100k entries is a lot, but if you only have to this every few minutes, it's ok if the algorithm takes a couple seconds.
The only area for improvement I see is memory usage. If this is a concern, you can skip the generation of the multimap and just keep a counter map around, something like this (beware, my C++ is a little rusty):
std::map<int, int> countFrequency; // count => how many customers with that count
for ( std::map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
// If it->second is not yet in countFrequency,
// the default constructor initializes it to 0.
countFrequency[it->second] += 1;
}
// Now write this out
for ( std::map<int, int>::const_iterator it = countFrequency.begin();
it != countFrequency.end(); ++it )
{
std::cout << "==> There are "
<< it->second << " users that have bought "
<< it->first << " products(s)" << std::endl;
}
If a user is added and buys count items, you can update countFrequency with
countFrequency[count] += 1;
If an existing user goes from oldCount to newCount items, you can update countFrequency with
countFrequency[oldCount] -= 1;
countFrequency[newCount] += 1;
Now, just as an aside, I recommend using an unsigned int for count (unless there's a legitimate reason for negative counts) and typedef'ing a userID type, for added readability.

If you can, I would recommend keeping both pieces of data current all the time. In other words, I would maintain a second map which is mapping number of products bought to number of customers who bought that many products. This map contains the exact answer to your question if you maintain it. Each time a customer buys a product, let n be the number of products this customer has now bought. Subtract one from the value at key n-1. Add one to the value at key n. If the range of keys is small enough this could be an array instead of a map. Do you ever expect a single customer to buy hundreds of products?

Just for larks, here's a mixed approach that uses a vector if the data is smallish, and a map to cover the case where one user has bought a truly absurd number of products. I doubt you'll really need the latter in a store app, but a more general version of the problem might benefit from it.
typedef std::map<int, int> Map;
typedef Map::const_iterator It;
template <typename Container>
void get_counts(const Map &source, Container &dest) {
for (It it = source.begin(); it != source.end(); ++it) {
++dest[it->second];
}
}
template <typename Container>
void print_counts(Container &people, int max_count) {
for (int i = 0; i <= max_count; ++i) {
if contains(people, i) {
std::cout << "==> There are "
<< people[i] << " users that have bought "
<< i << " products(s)" << std::endl;
}
}
}
// As an alternative to this overloaded contains(), you could write
// an overloaded print_counts -- after all the one above is not an
// efficient way to iterate a sparsely-populated map.
// Or you might prefer a template function that visits
// each entry in the container, calling a specified functor to
// will print the output, and passing it the key and value.
// This is just the smallest point of customization I thought of.
bool contains(const Map &c, int key) {
return c.count(key);
}
bool contains(const std::vector<int, int> &c, int key) {
// also check 0 < key < c.size() for a more general-purpose function
return c[key];
}
void do_everything(const Map &uc) {
// first get the max product count
int max_count = 0;
for (It it = uc.begin(); it != uc.end(); ++it) {
max_count = max(max_count, it->second);
}
if (max_count > uc.size()) { // or some other threshold
Map counts;
get_counts(uc, counts);
print_counts(counts, max_count);
} else {
std::vector<int> counts(max_count+1);
get_counts(uc, counts);
print_counts(counts, max_count);
}
}
From here you could refactor, to create a class template CountReOrderer, which takes a template parameter telling it whether to use a vector or a map for the counts.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Iterating Multiple Multimaps - c++

Related

Moving through list starting at both ends and stopping at middle using iterators

Static variables and functions that are called once for each choice of arguments

Operator overloading in map/pair

Why does renaming my variable prevent a segfault?

Efficient way to re-order a C++ map-based collection

Categories

Resources