Parallelism with C++ unordered_map

Parallelism with C++ unordered_map - c++

I have an unordered_map of type std::unordered_map<std::string, int64_t> sMap. This contains a number of strings and a 'weight' associated with each of them. I want to find the strings with the N largest weights.
If I wanted to do this using a single thread, I think I could create a priority queue of pairs like this
std::priority_queue<
std::pair<std::string, int64_t>,
std::vector<std::pair<std::string, int64_t>>,
std::function<bool(std::pair<std::string, int64_t>&,
std::pair<std::string, int64_t>&)>> prQ(comparePair);
and just go through the whole unordered_map, inserting elements to prQ while maintaining length N.
I want to achieve the same using multiple threads. I was thinking of assigning each thread to work on a few elements of the unordered_map to create a local priority queue of length N which can be merged into a global one at the end.
The problem I am facing right now is that the iterator which I get from unordered_map::begin() does not work with a + operator. At least that is the error that I am getting :
error: no match for ‘operator+’ (operand types are ‘std::unordered_map<std::basic_string<char>, long int>::iterator {aka std::__detail::_Node_iterator<std::pair<const std::b
asic_string<char>, long int>, false, true>}’ and ‘int’)
Thus, I cannot really specify a range of elements to be worked upon by a particular thread. The [] operator would take a key as expected and not an offset.
Essentially, I can't seem to find a way to have a data parallel loop that would work with only a few elements per thread. How can I solve this problem using multiple threads then?
EDIT : #Brian Vandberg asked me to supply a simplified example of the code that generates the error I was talking about.
std::unordered_map<std::string, int64_t> sMap;
//Initialize sMap values
int start = 0, end = 2;
for(auto i = sMap.begin() + start; sMap.begin() + end; ++i) {
std::cout<<i->first<<"\t"<<i->second<<"\n";
}

First, I'm not sure that I'd go with a priority queue for this problem (either single threaded, or as the part performed by a specific thread). The standard library has nth_element, which you can use to find the nth element in linear time. Following that, finding which elements are larger is also linear time.
You might consider this if speed is the problem, yours if size is a problem (nth_element will effectively force you to create a copy of the data). In this solution you iterate over the map (or part of it), and push_back only the weights into a vector, on which you perform nth_element. In the 2nd stage, loop again over the map, and choose those whose weight is higher.
Suppose you have the loop:
std::size_t j = 0;
for(const auto &e: sMap)
{
if(++j % k != i)
continue;
// Rest of code goes here.
}
Then if you use it for the ith thread out of k, it will partition the elements between the threads. Moreover, while all threads are iterating over the same elements (if only to skip most of them), it's happening in parallel.
Each thread can generate its candidates for the largest m elements, then choose the largest m elements from the km candidates using the method above (with the nth_element) or any other method.
It's interesting to ask what size of sMap will generate any speedup in practice.

Related

making a random paired vector from another single vector

// attender list, PID
std::vector<DWORD> m_vec_attender{1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119};
// duel list, PID <> PID
std::vector <DWORD, DWORD> m_vec_duelList;
I want to make a randomized vector pair (DWORD, DWORD) from another vector (DWORD) and iterate the paired one (duel list). In above example there are 9 player ID's. I want to make randomized pair these 8 players and leave the unpaired one in first vector (for later pairing; 'next round') and delete these 8 player ID's from first vector. As far as I know; std::make_pair and to get next element in vector std::next in C++11 but I'm so confused, it seems there are many ways to do this job but I couldn't find any reliable answer related to my question. Thanks in advance..

To solve your problem, you should begin by creating a copy of your original vector (if you still need it) and randomly shuffling it:
auto atten_rand = m_vec_attender;
std::shuffle(atten_rand.begin(), atten_rand.end());
Having randomly shuffled it, you can now just pair them sequentially. Your target vector should be declared as:
std::vector<std::pair<DWORD, DWORD>> m_vec_duelList;
And now you can do:
for (int i = 0; i != 4; ++i) {
m_vec_duelList.emplace_back(atten_rand[2*i], atten_rand[2*i+1]);
}
Note that by default, std::shuffle usually uses rand underlying for randomness, which is not a very high quality random number generator. This is probably fine to quickly generate a draw but if you have higher randomness needs you should investigate that in more detail.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

C++ std::map creation taking too long?

UPDATED:
I am working on a program whose performance is very critical. I have a vector of structs that are NOT sorted. I need to perform many search operations in this vector. So I decided to cache the vector data into a map like this:
std::map<long, int> myMap;
for (int i = 0; i < myVector.size(); ++i)
{
const Type& theType = myVector[i];
myMap[theType.key] = i;
}
When I search the map, the results of the rest of the program are much faster. However, the remaining bottleneck is the creation of the map itself (it is taking about 0.8 milliseconds on average to insert about 1,500 elements in it). I need to figure out a way to trim this time down. I am simply inserting a long as the key and an int as the value. I don't understand why it is taking this long.
Another idea I had was to create a copy of the vector (can't touch the original one) and somehow perform a faster sort than the std::sort (it takes way too long to sort it).
Edit:
Sorry everyone. I meant to say that I am creating a std::map where the key is a long and the value is an int. The long value is the struct's key value and the int is the index of the corresponding element in the vector.
Also, I did some more debugging and realized that the vector is not sorted at all. It's completely random. So doing something like a stable_sort isn't going to work out.
ANOTHER UPDATE:
Thanks everyone for the responses. I ended up creating a vector of pairs (std::vector of std::pair(long, int)). Then I sorted the vector by the long value. I created a custom comparator that only looked at the first part of the pair. Then I used lower_bound to search for the pair. Here's how I did it all:
typedef std::pair<long,int> Key2VectorIndexPairT;
typedef std::vector<Key2VectorIndexPairT> Key2VectorIndexPairVectorT;
bool Key2VectorIndexPairComparator(const Key2VectorIndexPairT& pair1, const Key2VectorIndexPairT& pair2)
{
return pair1.first < pair2.first;
}
...
Key2VectorIndexPairVectorT sortedVector;
sortedVector.reserve(originalVector.capacity());
// Assume "original" vector contains unsorted elements.
for (int i = 0; i < originalVector.size(); ++i)
{
const TheStruct& theStruct = originalVector[i];
sortedVector.insert(Key2VectorIndexPairT(theStruct.key, i));
}
std::sort(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairComparator);
...
const long keyToSearchFor = 20;
const Key2VectorIndexPairVectorT::const_iterator cItorKey2VectorIndexPairVector = std::lower_bound(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairT(keyToSearchFor, 0 /* Provide dummy index value for search */), Key2VectorIndexPairComparator);
if (cItorKey2VectorIndexPairVector->first == keyToSearchFor)
{
const int vectorIndex = cItorKey2VectorIndexPairVector->second;
const TheStruct& theStruct = originalVector[vectorIndex];
// Now do whatever you want...
}
else
{
// Could not find element...
}
This yielded a modest performance gain for me. Before the total time for my calculations were 3.75 milliseconds and now it is down to 2.5 milliseconds.

Both std::map and std::set are built on a binary tree and so adding items does dynamic memory allocation. If your map is largely static (i.e. initialized once at the start and then rarely or never has new items added or removed) you'd probably be better to use a sorted vector and a std::lower_bound to look up items using a binary search.

Maps take a lot of time for two reasons
You need to do a lot of memory allocation for your data storage
You need to perform O(n lg n) comparisons for the sort.
If you are just creating this as one batch, then throwing the whole map out, using a custom pool allocator may be a good idea here - eg, boost's pool_alloc. Custom allocators can also apply optimizations such as not actually deallocating any memory until the map's completely destroyed, etc.
Since your keys are integers, you may want to consider writing your own container based on a radix tree (on the bits of the key) as well. This may give you significantly improved performance, but since there is no STL implementation, you may need to write your own.
If you don't need to sort the data, use a hash table, such as std::unordered_map; these avoid the significant overhead needed for sorting data, and also can reduce the amount of memory allocation needed.
Finally, depending on the overall design of the program, it may be helpful to simply reuse the same map instead of recreating it over and over. Just delete and add keys as needed, rather than building a new vector, then building a new map. Again, this may not be possible in the context of your program, but if it is, it would definitely help you.

I suspect it's the memory management and tree rebalancing that's costing you here.
Obviously profiling may be able to help you pinpoint the issue.
I would suggest as a general idea to just copy the long/int data you need into another vector and since you said it's almost sorted, use stable_sort on it to finish the ordering. Then use lower_bound to locate the items in the sorted vector.

std::find is a linear scan(it has to be since it works on unsorted data). If you can sort(std::sort guaranties n log(n) behavior) the data then you can use std::binary_search to get log(n) searches. But as pointed out by others it may be copy time is the problem.

If keys are solid and short, perhaps try std::hash_map instead. From MSDN's page on hash_map Class:
The main advantage of hashing over sorting is greater efficiency; a
successful hashing performs insertions, deletions, and finds in
constant average time as compared with a time proportional to the
logarithm of the number of elements in the container for sorting
techniques.

Map creation can be a performance bottleneck (in the sense that it takes a measurable amount of time) if you're creating a large map and you're copying large chunks of data into it. You're also using the obvious (but suboptimal) way of inserting elements into a std::map - if you use something like:
myMap.insert(std::make_pair(theType.key, theType));
this should improve the insertion speed, but it will result in a slight change in behaviour if you encounter duplicate keys - using insert will result in values for duplicate keys being dropped, whereas using your method, the last element with the duplicate key will be inserted into the map.
I would also look into avoiding a making a copy of the data (for example by storing a pointer to it instead) if your profiling results determine that it's the copying of the element that is expensive. But for that you'll have to profile the code, IME guesstimates tend to be wrong...
Also, as a side note, you might want to look into storing the data in a std::set using custom comparator as your contains the key already. That however will not really result in a big speed up as constructing a set in this case is likely to be as expensive as inserting it into a map.

I'm not a C++ expert, but it seems that your problem stems from copying the Type instances, instead of a reference/pointer to the Type instances.
std::map<Type> myMap; // <-- this is wrong, since std::map requires two template parameters, not one
If you add elements to the map and they're not pointers, then I believe the copy constructor is invoked and that will certainly cause delays with a large data structure. Use the pointer instead:
std::map<KeyType, ObjectType*> myMap;
Furthermore, your example is a little confusing since you "insert" a value of type int in the map when you're expecting a value of type Type. I think you should assign the reference to the item, not the index.
myMap[theType.key] = &myVector[i];
Update:
The more I look at your example, the more confused I get. If you're using the std::map, then it should take two template types:
map<T1,T2> aMap;
So what are you REALLY mapping? map<Type, int> or something else?
It seems that you're using the Type.key member field as a key to the map (it's a valid idea), but unless key is of the same type as Type, then you can't use it as the key to the map. So is key an instance of Type??
Furthermore, you're mapping the current vector index to the key in the map, which indicates that you're just want the index to the vector so you can later access that index location fast. Is that what you want to do?
Update 2.0:
After reading your answer it seems that you're using std::map<long,int> and in that case there is no copying of the structure involved. Furthermore, you don't need to make a local reference to the object in the vector. If you just need to access the key, then access it by calling myVector[i].key.

Your building a copy of the table from the broken example you give, and not just a reference.
Why Can't I store references in an STL map in C++?
Whatever you store in the map it relies on you not changing the vector.
Try a lookup map only.
typedef vector<Type> Stuff;
Stuff myVector;
typedef std::map<long, *Type> LookupMap;
LookupMap myMap;
LookupMap::iterator hint = myMap.begin();
for (Stuff::iterator it = myVector.begin(); myVector.end() != it; ++it)
{
hint = myMap.insert(hint, std::make_pair(it->key, &*it));
}
Or perhaps drop the vector and just store it in the map??

Since your vector is already partially ordered, you may want to instead create an auxiliary array referencing (indices of) the elements in your original vector. Then you can sort the auxiliary array using Timsort which has good performance for partially sorted data (such as yours).

I think you've got some other problem. Creating a vector of 1500 <long, int> pairs, and sorting it based on the longs should take considerably less than 0.8 milliseconds (at least assuming we're talking about a reasonably modern, desktop/server type processor).
To try to get an idea of what we should see here, I did a quick bit of test code:
#include <vector>
#include <algorithm>
#include <time.h>
#include <iostream>
int main() {
const int size = 1500;
const int reps = 100;
std::vector<std::pair<long, int> > init;
std::vector<std::pair<long, int> > data;
long total = 0;
// Generate "original" array
for (int i=0; i<size; i++)
init.push_back(std::make_pair(rand(), i));
clock_t start = clock();
for (int i=0; i<reps; i++) {
// copy the original array
std::vector<std::pair<long, int> > data(init.begin(), init.end());
// sort the copy
std::sort(data.begin(), data.end());
// use data that depends on sort to prevent it being optimized away
total += data[10].first;
total += data[size-10].first;
}
clock_t stop = clock();
std::cout << "Ignore: " << total << "\n";
clock_t ticks = stop - start;
double seconds = ticks / (double)CLOCKS_PER_SEC;
double ms = seconds * 1000.0;
double ms_p_iter = ms / reps;
std::cout << ms_p_iter << " ms/iteration.";
return 0;
}
Running this on my somewhat "trailing edge" (~5 year-old) machine, I'm getting times around 0.1 ms/iteration. I'd expect searching in this (using std::lower_bound or std::upper_bound) to be somewhat faster than searching in an std::map as well (since the data in the vector is allocated contiguously, we can expect better locality of reference, leading to better cache usage).

Thanks everyone for the responses. I ended up creating a vector of pairs (std::vector of std::pair(long, int)). Then I sorted the vector by the long value. I created a custom comparator that only looked at the first part of the pair. Then I used lower_bound to search for the pair. Here's how I did it all:
typedef std::pair<long,int> Key2VectorIndexPairT;
typedef std::vector<Key2VectorIndexPairT> Key2VectorIndexPairVectorT;
bool Key2VectorIndexPairComparator(const Key2VectorIndexPairT& pair1, const Key2VectorIndexPairT& pair2)
{
return pair1.first < pair2.first;
}
...
Key2VectorIndexPairVectorT sortedVector;
sortedVector.reserve(originalVector.capacity());
// Assume "original" vector contains unsorted elements.
for (int i = 0; i < originalVector.size(); ++i)
{
const TheStruct& theStruct = originalVector[i];
sortedVector.insert(Key2VectorIndexPairT(theStruct.key, i));
}
std::sort(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairComparator);
...
const long keyToSearchFor = 20;
const Key2VectorIndexPairVectorT::const_iterator cItorKey2VectorIndexPairVector = std::lower_bound(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairT(keyToSearchFor, 0 /* Provide dummy index value for search */), Key2VectorIndexPairComparator);
if (cItorKey2VectorIndexPairVector->first == keyToSearchFor)
{
const int vectorIndex = cItorKey2VectorIndexPairVector->second;
const TheStruct& theStruct = originalVector[vectorIndex];
// Now do whatever you want...
}
else
{
// Could not find element...
}
This yielded a modest performance gain for me. Before the total time for my calculations were 3.75 milliseconds and now it is down to 2.5 milliseconds.

boost::multi_index_container with random_access and ordered_unique

I have a problem getting boost::multi_index_container work with random-access and with orderd_unique at the same time. (I'm sorry for the lengthly question, but I think I should use an example..)
Here an example: Suppose I want to produce N objects in a factory and for each object I have a demand to fulfill (this demand is known at creation of the multi-index).
Well, within my algorithm I get intermediate results, which I store in the following class:
class intermediate_result
{
private:
std::vector<int> parts; // which parts are produced
int used_time; // how long did it take to produce
ValueType max_value; // how much is it worth
};
The vector parts descibes, which objects are produced (its length is N and it is lexicographically smaller then my coresp demand-vector!) - for each such vector I know the used_time as well. Additionally I get a value for this vector of produced objects.
I got another constraint so that I can't produce every object - my algorithm needs to store several intermediate_result-objects in a data-structure. And here boost::multi_index_container is used, because the pair of parts and used_time describes a unique intermediate_result (and it should be unique in my data-structure) but the max_value is another index I'll have to consider, because my algorithm always needs the intermediate_result with the highest max_value.
So I tried to use boost::multi_index_container with ordered_unique<> for my "parts&used_time-pair" and ordered_non_unique<> for my max_value (different intermediate_result-objects may have the same value).
The problem is: the predicate, which is needed to decide which "parts&used_time-pair" is smaller, uses std::lexicographical_compare on my parts-vector and hence is very slow for many intermediate_result-objects.
But there would be a solution: my demand for each object isn't that high, therefore I could store on each possible parts-vector the intermediate results uniquely by its used_time.
For example: if I have a demand-vector ( 2 , 3 , 1) then I need a data-structure which stores (2+1)*(3+1)*(1+1)=24 possible parts-vectors and on each such entry the different used_times, which have to be unique! (storing the smallest time is insufficient - for example: if my additional constraint is: to meet a given time exactly for production)
But how do I combine a random_access<>-index with an ordered_unique<>-index?
(Example11 didn't help me on this one..)

To use two indices you could write the following:
indexed_by<
random_access< >,
ordered_unique<
composite_key<
intermediate_result,
member<intermediate_result, int, &intermediate_result::used_time>,
member<intermediate_result, std::vector<int>, &intermediate_result::parts>
>
>
>
You could use composite_key for comparing used_time at first and vector only if necessary. Besides that, keep in mind that you could use member function as index.

(I had to use an own answer to write code-blocks - sorry!)
The composite_key with used_time and parts (as Kirill V. Lyadvinsky suggested) is basically what I've already implemented. I want to get rid of the lexicographical compare of the parts-vector.
Suppose I've stored the needed_demand somehow then I could write a simple function, which returns the correct index within a random-access data-structure like that:
int get_index(intermediate_result &input_result) const
{
int ret_value = 0;
int index_part = 1;
for(int i=0;i<needed_demand.size();++i)
{
ret_value += input_result.get_part(i) * index_part;
index_part *= (needed_demand.get_part(i) + 1);
}
}
Obviously this can be implemented more efficiently and this is not the only possible index ordering for the needed demand. But let's suppose this function exists as a member-function of intermediate_result! Is it possible to write something like this to prevent lexicographical_compare ?
indexed_by<
random_access< >,
ordered_unique<
composite_key<
intermediate_result,
member<intermediate_result, int, &intermediate_result::used_time>,
const_mem_fun<intermediate_result,int,&intermediate_result::get_index>
>
>
>
If this is possible and I initialized the multi-index with all possible parts-vectors (i.e. in my comment above I would've pushed 24 empty maps in my data-structure), does this find the right entry for a given intermediate_result in constant time (after computing the correct index with get_index) ?
I have to ask this, because I don't quite see, how the random_access<> index is linked with the ordered_unique<> index..
But thank you for your answers so far!!

std::map and performance, intersecting sets

I'm intersecting some sets of numbers, and doing this by storing a count of each time I see a number in a map.
I'm finding the performance be very slow.
Details:
- One of the sets has 150,000 numbers in it
- The intersection of that set and another set takes about 300ms the first time, and about 5000ms the second time
- I haven't done any profiling yet, but every time I break the debugger while doing the intersection its in malloc.c!
So, how can I improve this performance? Switch to a different data structure? Some how improve the memory allocation performance of map?
Update:
Is there any way to ask std::map or
boost::unordered_map to pre-allocate
some space?
Or, are there any tips for using these efficiently?
Update2:
See Fast C++ container like the C# HashSet<T> and Dictionary<K,V>?
Update3:
I benchmarked set_intersection and got horrible results:
(set_intersection) Found 313 values in the intersection, in 11345ms
(set_intersection) Found 309 values in the intersection, in 12332ms
Code:
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}

You should definitely be using preallocated vectors which are way faster. The problem with doing set intersection with stl sets is that each time you move to the next element you're chasing a dynamically allocated pointer, which could easily not be in your CPU caches. With a vector the next element will often be in your cache because it's physically close to the previous element.
The trick with vectors, is that if you don't preallocate the memory for a task like this, it'll perform EVEN WORSE because it'll go on reallocating memory as it resizes itself during your initialization step.
Try something like this instaed - it'll be WAY faster.
int runIntersectionTestAlgo() {
vector<char> vector1; vector1.reserve(100000);
vector<char> vector2; vector2.reserve(1000);
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ ) {
int value = 1000000000 + i;
set1.push_back(value);
}
sort(vector1.begin(), vector1.end());
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ ) {
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.push_back(value);
}
sort(vector2.begin(), vector2.end());
// Reserve at most 1,000 spots for the intersection
vector<char> intersection; intersection.reserve(min(vector1.size(),vector2.size()));
set_intersection(vector1.begin(), vector1.end(),vector2.begin(), vector2.end(),back_inserter(intersection));
return intersection.size();
}

Without knowing any more about your problem, "check with a good profiler" is the best general advise I can give. Beyond that...
If memory allocation is your problem, switch to some sort of pooled allocator that reduces calls to malloc. Boost has a number of custom allocators that should be compatible with std::allocator<T>. In fact, you may even try this before profiling, if you've already noticed debug-break samples always ending up in malloc.
If your number-space is known to be dense, you can switch to using a vector- or bitset-based implementation, using your numbers as indexes in the vector.
If your number-space is mostly sparse but has some natural clustering (this is a big if), you may switch to a map-of-vectors. Use higher-order bits for map indexing, and lower-order bits for vector indexing. This is functionally very similar to simply using a pooled allocator, but it is likely to give you better caching behavior. This makes sense, since you are providing more information to the machine (clustering is explicit and cache-friendly, rather than a random distribution you'd expect from pool allocation).

I would second the suggestion to sort them. There are already STL set algorithms that operate on sorted ranges (like set_intersection, set_union, etc):
set_intersection

I don't understand why you have to use a map to do intersection. Like people have said, you could put the sets in std::set's, and then use std::set_intersection().
Or you can put them into hash_set's. But then you would have to implement intersection manually: technically you only need to put one of the sets into a hash_set, and then loop through the other one, and test if each element is contained in the hash_set.

Intersection with maps are slow, try a hash_map. (however, this is not provided in all STL implementation.
Alternatively, sort both map and do it in a merge-sort-like way.

What is your intersection algorithm? Maybe there are some improvements to be made?
Here is an alternate method
I do not know it to be faster or slower, but it could be something to try. Before doing so, I also recommend using a profiler to ensure you really are working on the hotspot. Change the sets of numbers you are intersecting to use std::set<int> instead. Then iterate through the smallest one looking at each value you find. For each value in the smallest set, use the find method to see if the number is present in each of the other sets (for performance, search from smallest to largest).
This is optimised in the case that the number is not found in all of the sets, so if the intersection is relatively small, it may be fast.
Then, store the intersection in std::vector<int> instead - insertion using push_back is also very fast.
Here is another alternate method
Change the sets of numbers to std::vector<int> and use std::sort to sort from smallest to largest. Then use std::binary_search to find the values, using roughly the same method as above. This may be faster than searching a std::set since the array is more tightly packed in memory. Actually, never mind that, you can then just iterate through the values in lock-step, looking at the ones with the same value. Increment only the iterators which are less than the minimum value you saw at the previous step (if the values were different).

Might be your algorithm. As I understand it, you are spinning over each set (which I'm hoping is a standard set), and throwing them into yet another map. This is doing a lot of work you don't need to do, since the keys of a standard set are in sorted order already. Instead, take a "merge-sort" like approach. Spin over each iter, dereferencing to find the min. Count the number that have that min, and increment those. If the count was N, add it to the intersection. Repeat until the first map hits it's end (If you compare the sizes before starting, you won't have to check every map's end each time).
Responding to update: There do exist faculties to speed up memory allocation by pre-reserving space, like boost::pool_alloc. Something like:
std::map<int, int, std::less<int>, boost::pool_allocator< std::pair<int const, int> > > m;
But honestly, malloc is pretty good at what it does; I'd profile before doing anything too extreme.

Look at your algorithms, then choose the proper data type. If you're going to have set-like behaviour, and want to do intersections and the like, std::set is the container to use.
Since it's elements are stored in a sorted way, insertion may cost you O(log N), but intersection with another (sorted!) std::set can be done in linear time.

I figured something out: if I attach the debugger to either RELEASE or DEBUG builds (e.g. hit F5 in the IDE), then I get horrible times.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js