STL containers and large amounts of data - c++

I have a large collection of data that is read into memory - temporarily, but necessary for the system.
I have been checking the performance of std::vector as well as std::unordered_map.
For std::vector I used a struct of type:
struct information{
std::string name;
unsigned int offset;
}
For std::unordered_map I used the std::string for the key and the unsigned int offset for the value.
If, let's say, 2 000 000 of these are loaded into memory, I tried the following and got these results:
std::vector:
On random string, never really larger than 32 characters if a reserve was called on the vector.
std::vector<information> vec;
vec.reserve(2500000);
The insertion
vec.push_back({dataName, offset});
is quite fast. Trying to find data is very slow though. The find was implemented like this:
auto it = std::find_if(vec.begin(), vec.end(), [&name](information &info) -> bool {return info.name == name); });
Which makes sense seeing that it is a large vector and the correct struct is found on a name compare. But it was extremely poor performance. The memory used was fine - I assume a part of the memory growth was due to the std::string size resizing.
My question on the vector implementation is: Is there a way to increase the look up time? I know that a vector can be sorted to increase your look-up time, but then you lose time in sorting the vector. Especially on a vector of this size.
std::unordered_map:
The insertion
std::unordered_map<std::string, unsigned int> unordMap;
unordMap.reserve(2500000);
unordMap.emplace(name, offset);
takes a very long time. When reserving space beforehand in an attempt to shorten the insertion time the following happens:
The memory at the end of insertion is a lot more when not calling reserve, without reserve the memory is still a lot more than the vector implementation. The reserve doesn't really improve insertion time.
Of course the look up is very fast. My question about the std::unordered_map is can the insertion time and memory usage be improved?
If neither of these can be done, then my next question will probably be quite obvious. Is there a way to get a result in between these two data structures? What is the best for large amounts of data?

struct information{
std::string name;
unsigned int offset;
information(information const&)=default;
information(information&&)=default;
information(std::string n, unsigned o):name(std::move(n)),offset(o),hash(std::hash<std::string>()(name)) {};
information():information("",0) {};
bool operator<( information const& o ) const {
return tie() < o.tie();
}
std::tuple<std::size_t, std::string const&> tie() const { return std::tie(hash, name); }
private:
std::size_t hash;
};
Use the above structure for your std::vector.
After adding all the data, std::sort it.
To find something matching name do this:
struct information_searcher {
struct helper {
std::tuple<std::size_t, std::string const&> data;
helper( std::string const& o ):data(std::hash<std::string>()(o), o) {};
helper( helper const& o ) = default;
helper( information const& o ):data(o.tie()) {}
bool operator<( helper const& o ) const { return data < o.data; }
};
bool operator()( helper lhs, helper rhs ) const { return lhs < rhs; }
};
information* get_info_by_name( std::string const& name ) {
auto range = std::equal_range( vec.begin(), vec.end(), information_searcher::helper(name), information_searcher{} );
if (range.first == range.second) {
return nullptr;
} else {
return &*range.first;
}
}
which is a nearly zero-overhead lookup.
What we do here is hash the strings (for fast comparison), falling back on std::string comparison if we have a collision.
information_searcher is a class that lets us search the data without having to create an information (which would require a wasteful allocation).
get_info_by_name returns a pointer -- nullptr if not found, and a pointer to the first element with that name otherwise.
Changing information.name is impolite, and makes the hash field incorrect.
This may use moderately more memory than the naive std::vector version.
In general, if your work consists of 'add a bunch of stuff to a table' then 'do a bunch of lookups', your best bet is to build a std::vector, sort it in a fast way, then use equal_range to do the lookups. map and unordered_map are optimized for lots of mixed inserts/deletes/etc.

vector is usually implemented as 'dynamic array' and should be the most memory-efficient.
with good reservation strategy it can have insert of O(1) = fast. Searching is O(n) = very bad.
You can help vector by sorting it (and if you first load then search than I think it would be best - std::sort + std::binary_search).
You can as well implement something like insert-sort using std::lower_bound. insert = O(log n) = good, search = O(log n) = good
map (ordered) can actually do the same work, but may as well be implemented using tree = less memory efficient, access as good as sorted vector (but maybe less reallocation, but in your case, sorted vector is still best)
unordered_map is usually imlemented using hash tables = some memory overhead but fast operations (insertion cannot be that fast as in unsorted vector, but should still be pretty fast). The problen with hashing is that it can be fast and even fastest, but may as well be the worst solution (in extreme conditions). The above structures (sorted vector and map/tree are stable, always behave the same - logaritmic complexity).

The problem with a large vector is the lookup time when you don't know the index of objects you want. One way to improve it is, as you stated, to keep an ordered vector and do binary search on it. That way, lookup time will not be of linear complexity but rather of logarithmic complexity, which saves up quite a lot of time with very large containers. This is the lookup used in std::map (the ordered one). You can do a similar binary search using std::lower_bound or std::equal_range on your std::vector.
The problem with a large unordered map is completely different : this kind of container use a hash function and modulus calculation in order to place the elements according to their keys in a standard array. So when you have a n elements in a std::unordered_map, it is very unlikely that you only need an n-elements-long array, because some indices will not be filled. You will use at least the greatest index produced by the hash-and-modulo. One way to improve memory usage as well as insertion time is to write your own hash function. But it might be hard depending on what kind of strings you are using.

Well, the optimal solution here would be to create std::map, which is logarithmic in complexity both in insertion and in lookup. Although, I don't see any reason why you wouldn't use std::vector. It is pretty fast when using quick sort to sort even 2M elements, especially if you do it once. std::binary_search is then really fast. Think it over if you need to make a lot of lookups between insertions.

Related

Which container is most efficient for multiple insertions / deletions in C++?

I was set a homework challenge as part of an application process (I was rejected, by the way; I wouldn't be writing this otherwise) in which I was to implement the following functions:
// Store a collection of integers
class IntegerCollection {
public:
// Insert one entry with value x
void Insert(int x);
// Erase one entry with value x, if one exists
void Erase(int x);
// Erase all entries, x, from <= x < to
void Erase(int from, int to);
// Return the count of all entries, x, from <= x < to
size_t Count(int from, int to) const;
The functions were then put through a bunch of tests, most of which were trivial. The final test was the real challenge as it performed 500,000 single insertions, 500,000 calls to count and 500,000 single deletions.
The member variables of IntegerCollection were not specified and so I had to choose how to store the integers. Naturally, an STL container seemed like a good idea and keeping it sorted seemed an easy way to keep things efficient.
Here is my code for the four functions using a vector:
// Previous bit of code shown goes here
private:
std::vector<int> integerCollection;
};
void IntegerCollection::Insert(int x) {
/* using lower_bound to find the right place for x to be inserted
keeps the vector sorted and makes life much easier */
auto it = std::lower_bound(integerCollection.begin(), integerCollection.end(), x);
integerCollection.insert(it, x);
}
void IntegerCollection::Erase(int x) {
// find the location of the first element containing x and delete if it exists
auto it = std::find(integerCollection.begin(), integerCollection.end(), x);
if (it != integerCollection.end()) {
integerCollection.erase(it);
}
}
void IntegerCollection::Erase(int from, int to) {
if (integerCollection.empty()) return;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
/* std::vector::erase deletes entries between the two pointers
fromBound (included) and toBound (not indcluded) */
integerCollection.erase(fromBound, toBound);
}
size_t IntegerCollection::Count(int from, int to) const {
if (integerCollection.empty()) return 0;
int count = 0;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
// increment pointer until fromBound == toBound (we don't count elements of value = to)
while (fromBound != toBound) {
++count; ++fromBound;
}
return count;
}
The company got back to me saying that they wouldn't be moving forward because my choice of container meant the runtime complexity was too high. I also tried using list and deque and compared the runtime. As I expected, I found that list was dreadful and that vector took the edge over deque. So as far as I was concerned I had made the best of a bad situation, but apparently not!
I would like to know what the correct container to use in this situation is? deque only makes sense if I can guarantee insertion or deletion to the ends of the container and list hogs memory. Is there something else that I'm completely overlooking?
We cannot know what would make the company happy. If they reject std::vector without concise reasoning I wouldn't want to work for them anyway. Moreover, we dont really know the precise requirements. Were you asked to provide one reasonably well performing implementation? Did they expect you to squeeze out the last percent of the provided benchmark by profiling a bunch of different implementations?
The latter is probably too much for a homework challenge as part of an application process. If it is the first you can either
roll your own. It is unlikely that the interface you were given can be implemented more efficiently than one of the std containers does... unless your requirements are so specific that you can write something that performs well under that specific benchmark.
std::vector for data locality. See eg here for Bjarne himself advocating std::vector rather than linked lists.
std::set for ease of implementation. It seems like you want the container sorted and the interface you have to implement fits that of std::set quite well.
Let's compare only isertion and erasure assuming the container needs to stay sorted:
operation std::set std::vector
insert log(N) N
erase log(N) N
Note that the log(N) for the binary_search to find the position to insert/erase in the vector can be neglected compared to the N.
Now you have to consider that the asymptotic complexity listed above completely neglects the non-linearity of memory access. In reality data can be far away in memory (std::set) leading to many cache misses or it can be local as with std::vector. The log(N) only wins for huge N. To get an idea of the difference 500000/log(500000) is roughly 26410 while 1000/log(1000) is only ~100.
I would expect std::vector to outperform std::set for considerably small container sizes, but at some point the log(N) wins over cache. The exact location of this turning point depends on many factors and can only reliably determined by profiling and measuring.
Nobody knows which container is MOST efficient for multiple insertions / deletions. That is like asking what is the most fuel-efficient design for a car engine possible. People are always innovating on the car engines. They make more efficient ones all the time. However, I would recommend a splay tree. The time required for a insertion or deletion is a splay tree is not constant. Some insertions take a long time and some take only a very a short time. However, the average time per insertion/deletion is always guaranteed to be be O(log n), where n is the number of items being stored in the splay tree. logarithmic time is extremely efficient. It should be good enough for your purposes.
The first thing that comes to mind is to hash the integer value so single look ups can be done in constant time.
The integer value can be hashed to compute an index in to an array of bools or bits, used to tell if the integer value is in the container or not.
Counting and and deleting large ranges could be sped up from there, by using multiple hash tables for specific integer ranges.
If you had 0x10000 hash tables, that each stored ints from 0 to 0xFFFF and were using 32 bit integers you could then mask and shift the upper half of the int value and use that as an index to find the correct hash table to insert / delete values from.
IntHashTable containers[0x10000];
u_int32 hashIndex = (u_int32)value / 0x10000;
u_int32int valueInTable = (u_int32)value - (hashIndex * 0x10000);
containers[hashIndex].insert(valueInTable);
Count for example could be implemented as so, if each hash table kept count of the number of elements it contained:
indexStart = startRange / 0x10000;
indexEnd = endRange / 0x10000;
int countTotal = 0;
for (int i = indexStart; i<=indexEnd; ++i) {
countTotal += containers[i].count();
}
Not sure if using sorting really is a requirement for removing the range. It might be based on position. Anyway, here is a link with some hints which STL container to use.
In which scenario do I use a particular STL container?
Just FYI.
Vector maybe a good choice, but it does a lot of re allocation, as you know. I prefer deque instead, as it doesn't require big chunk of memory to allocate all items. For such requirement as you had, list probably fit better.
Basic solution for this problem might be std::map<int, int>
where key is the integer you are storing and value is the number of occurences.
Problem with this is that you can not quickly remove/count ranges. In other words complexity is linear.
For quick count you would need to implement your own complete binary tree where you can know the number of nodes between 2 nodes(upper and lower bound node) because you know the size of tree, and you know how many left and right turns you took to upper and lower bound nodes. Note that we are talking about complete binary tree, in general binary tree you can not make this calculation fast.
For quick range remove I do not know how to make it faster than linear.

Stable sorting a vector using std::sort

So I have some code like this, I want to sort the vector based on id and put the last overridden element first:
struct Data {
int64_t id;
double value;
};
std::vector<Data> v;
// add some Datas to v
// add some 'override' Datas with duplicated `id`s
std::sort(v.begin(), v.end(),
[](const Data& a, const Data& b) {
if (a.id < b.id) {
return true;
} else if (b.id < a.id) {
return false;
}
return &a > &b;
});
Since vectors are contiguous, &a > &b should work to put the appended overrides first in the sorted vector, which should be equivalent to using std::stable_sort, but I am not sure if there is a state in the std::sort implementation where the equal values would be swapped such that the address of an element that appeared later in the original vector is earlier now. I don't want to use stable_sort because it is significantly slower for my use case. I have also considered adding a field to the struct that keeps track of the original index, but I will need to copy the vector for that.
It seems to work here: https://onlinegdb.com/Hk8z1giqX
std::sort gives no guarantees whatsoever on when elements are compared, and in practice, I strongly suspect most implementations will misbehave for your comparator.
The common std::sort implementation is either plain quicksort or a hybrid sort (quicksort switching to a different sort for small ranges), implemented in-place to avoid using extra memory. As such, the comparator will be invoked with the same element at different memory addresses as the sort progresses; you can't use memory addresses to implement a stable sort.
Either add the necessary info to make the sort innately stable (e.g. the suggested initial index value) or use std::stable_sort. Using memory addresses to stabilize the sort won't work.
For the record, having experimented a bit, I suspect your test case is too small to trigger the issue. At a guess, the hybrid sorting strategy works coincidentally for smallish vectors, but breaks down when the vector gets large enough for an actual quicksort to occur. Once I increase your vector size with some more filler, the stability disappears, Try it online!

How to optimize a std::set intersection algorithm (C++)

I'm struggling with a part of my college assignment. I have two subsets of std::set containers containing pointers to quite complex objects, but ordered by different criteria (which is why I can't use std::set_intersection()). I need to find elements that are contained in both subsets as fast as possible. There is a time/complexity requirement on the assignment.
I can do that in n*log(m) time where n is the size of the first subset and m is the size of the second subset by doing the following:
for(auto it = subset1.begin(), it != subset1.end(), it++){
if(find(subset2.begin(), subset2.end(), *it))
result.insert(*it);
}
This fails the time requirement, which says worst case linear, but average better than linear.
I found the following question here and I find the hashtable approach interesting. However, I fear that the creation of the hashtable might incur too much overhead. The class contained in the sets looks something like this:
class containedInSets {
//methods
private:
vector<string> member1;
SomeObject member2;
int member3;
}
I have no control over the SomeObject class, and therefore cannot specify a hash function for it. I'd have to hash the pointer. Furthermore, the vector may grow quite (in the thousands of entries).
What is the quickest way of doing this?
Your code is not O(n log(m)) but O(n * m).
std::find(subset2.begin(), subset2.end(), *it) is linear, but std::set has methods find and count which are in O(log(n)) (they do a binary search).
So you can simply do:
for (const auto& e : subset1) {
if (subset2.count(e) != 0) {
result.insert(e);
}
}
Which has complexity of n*log(m) instead of your n * m.

C++ std::map creation taking too long?

UPDATED:
I am working on a program whose performance is very critical. I have a vector of structs that are NOT sorted. I need to perform many search operations in this vector. So I decided to cache the vector data into a map like this:
std::map<long, int> myMap;
for (int i = 0; i < myVector.size(); ++i)
{
const Type& theType = myVector[i];
myMap[theType.key] = i;
}
When I search the map, the results of the rest of the program are much faster. However, the remaining bottleneck is the creation of the map itself (it is taking about 0.8 milliseconds on average to insert about 1,500 elements in it). I need to figure out a way to trim this time down. I am simply inserting a long as the key and an int as the value. I don't understand why it is taking this long.
Another idea I had was to create a copy of the vector (can't touch the original one) and somehow perform a faster sort than the std::sort (it takes way too long to sort it).
Edit:
Sorry everyone. I meant to say that I am creating a std::map where the key is a long and the value is an int. The long value is the struct's key value and the int is the index of the corresponding element in the vector.
Also, I did some more debugging and realized that the vector is not sorted at all. It's completely random. So doing something like a stable_sort isn't going to work out.
ANOTHER UPDATE:
Thanks everyone for the responses. I ended up creating a vector of pairs (std::vector of std::pair(long, int)). Then I sorted the vector by the long value. I created a custom comparator that only looked at the first part of the pair. Then I used lower_bound to search for the pair. Here's how I did it all:
typedef std::pair<long,int> Key2VectorIndexPairT;
typedef std::vector<Key2VectorIndexPairT> Key2VectorIndexPairVectorT;
bool Key2VectorIndexPairComparator(const Key2VectorIndexPairT& pair1, const Key2VectorIndexPairT& pair2)
{
return pair1.first < pair2.first;
}
...
Key2VectorIndexPairVectorT sortedVector;
sortedVector.reserve(originalVector.capacity());
// Assume "original" vector contains unsorted elements.
for (int i = 0; i < originalVector.size(); ++i)
{
const TheStruct& theStruct = originalVector[i];
sortedVector.insert(Key2VectorIndexPairT(theStruct.key, i));
}
std::sort(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairComparator);
...
const long keyToSearchFor = 20;
const Key2VectorIndexPairVectorT::const_iterator cItorKey2VectorIndexPairVector = std::lower_bound(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairT(keyToSearchFor, 0 /* Provide dummy index value for search */), Key2VectorIndexPairComparator);
if (cItorKey2VectorIndexPairVector->first == keyToSearchFor)
{
const int vectorIndex = cItorKey2VectorIndexPairVector->second;
const TheStruct& theStruct = originalVector[vectorIndex];
// Now do whatever you want...
}
else
{
// Could not find element...
}
This yielded a modest performance gain for me. Before the total time for my calculations were 3.75 milliseconds and now it is down to 2.5 milliseconds.
Both std::map and std::set are built on a binary tree and so adding items does dynamic memory allocation. If your map is largely static (i.e. initialized once at the start and then rarely or never has new items added or removed) you'd probably be better to use a sorted vector and a std::lower_bound to look up items using a binary search.
Maps take a lot of time for two reasons
You need to do a lot of memory allocation for your data storage
You need to perform O(n lg n) comparisons for the sort.
If you are just creating this as one batch, then throwing the whole map out, using a custom pool allocator may be a good idea here - eg, boost's pool_alloc. Custom allocators can also apply optimizations such as not actually deallocating any memory until the map's completely destroyed, etc.
Since your keys are integers, you may want to consider writing your own container based on a radix tree (on the bits of the key) as well. This may give you significantly improved performance, but since there is no STL implementation, you may need to write your own.
If you don't need to sort the data, use a hash table, such as std::unordered_map; these avoid the significant overhead needed for sorting data, and also can reduce the amount of memory allocation needed.
Finally, depending on the overall design of the program, it may be helpful to simply reuse the same map instead of recreating it over and over. Just delete and add keys as needed, rather than building a new vector, then building a new map. Again, this may not be possible in the context of your program, but if it is, it would definitely help you.
I suspect it's the memory management and tree rebalancing that's costing you here.
Obviously profiling may be able to help you pinpoint the issue.
I would suggest as a general idea to just copy the long/int data you need into another vector and since you said it's almost sorted, use stable_sort on it to finish the ordering. Then use lower_bound to locate the items in the sorted vector.
std::find is a linear scan(it has to be since it works on unsorted data). If you can sort(std::sort guaranties n log(n) behavior) the data then you can use std::binary_search to get log(n) searches. But as pointed out by others it may be copy time is the problem.
If keys are solid and short, perhaps try std::hash_map instead. From MSDN's page on hash_map Class:
The main advantage of hashing over sorting is greater efficiency; a
successful hashing performs insertions, deletions, and finds in
constant average time as compared with a time proportional to the
logarithm of the number of elements in the container for sorting
techniques.
Map creation can be a performance bottleneck (in the sense that it takes a measurable amount of time) if you're creating a large map and you're copying large chunks of data into it. You're also using the obvious (but suboptimal) way of inserting elements into a std::map - if you use something like:
myMap.insert(std::make_pair(theType.key, theType));
this should improve the insertion speed, but it will result in a slight change in behaviour if you encounter duplicate keys - using insert will result in values for duplicate keys being dropped, whereas using your method, the last element with the duplicate key will be inserted into the map.
I would also look into avoiding a making a copy of the data (for example by storing a pointer to it instead) if your profiling results determine that it's the copying of the element that is expensive. But for that you'll have to profile the code, IME guesstimates tend to be wrong...
Also, as a side note, you might want to look into storing the data in a std::set using custom comparator as your contains the key already. That however will not really result in a big speed up as constructing a set in this case is likely to be as expensive as inserting it into a map.
I'm not a C++ expert, but it seems that your problem stems from copying the Type instances, instead of a reference/pointer to the Type instances.
std::map<Type> myMap; // <-- this is wrong, since std::map requires two template parameters, not one
If you add elements to the map and they're not pointers, then I believe the copy constructor is invoked and that will certainly cause delays with a large data structure. Use the pointer instead:
std::map<KeyType, ObjectType*> myMap;
Furthermore, your example is a little confusing since you "insert" a value of type int in the map when you're expecting a value of type Type. I think you should assign the reference to the item, not the index.
myMap[theType.key] = &myVector[i];
Update:
The more I look at your example, the more confused I get. If you're using the std::map, then it should take two template types:
map<T1,T2> aMap;
So what are you REALLY mapping? map<Type, int> or something else?
It seems that you're using the Type.key member field as a key to the map (it's a valid idea), but unless key is of the same type as Type, then you can't use it as the key to the map. So is key an instance of Type??
Furthermore, you're mapping the current vector index to the key in the map, which indicates that you're just want the index to the vector so you can later access that index location fast. Is that what you want to do?
Update 2.0:
After reading your answer it seems that you're using std::map<long,int> and in that case there is no copying of the structure involved. Furthermore, you don't need to make a local reference to the object in the vector. If you just need to access the key, then access it by calling myVector[i].key.
Your building a copy of the table from the broken example you give, and not just a reference.
Why Can't I store references in an STL map in C++?
Whatever you store in the map it relies on you not changing the vector.
Try a lookup map only.
typedef vector<Type> Stuff;
Stuff myVector;
typedef std::map<long, *Type> LookupMap;
LookupMap myMap;
LookupMap::iterator hint = myMap.begin();
for (Stuff::iterator it = myVector.begin(); myVector.end() != it; ++it)
{
hint = myMap.insert(hint, std::make_pair(it->key, &*it));
}
Or perhaps drop the vector and just store it in the map??
Since your vector is already partially ordered, you may want to instead create an auxiliary array referencing (indices of) the elements in your original vector. Then you can sort the auxiliary array using Timsort which has good performance for partially sorted data (such as yours).
I think you've got some other problem. Creating a vector of 1500 <long, int> pairs, and sorting it based on the longs should take considerably less than 0.8 milliseconds (at least assuming we're talking about a reasonably modern, desktop/server type processor).
To try to get an idea of what we should see here, I did a quick bit of test code:
#include <vector>
#include <algorithm>
#include <time.h>
#include <iostream>
int main() {
const int size = 1500;
const int reps = 100;
std::vector<std::pair<long, int> > init;
std::vector<std::pair<long, int> > data;
long total = 0;
// Generate "original" array
for (int i=0; i<size; i++)
init.push_back(std::make_pair(rand(), i));
clock_t start = clock();
for (int i=0; i<reps; i++) {
// copy the original array
std::vector<std::pair<long, int> > data(init.begin(), init.end());
// sort the copy
std::sort(data.begin(), data.end());
// use data that depends on sort to prevent it being optimized away
total += data[10].first;
total += data[size-10].first;
}
clock_t stop = clock();
std::cout << "Ignore: " << total << "\n";
clock_t ticks = stop - start;
double seconds = ticks / (double)CLOCKS_PER_SEC;
double ms = seconds * 1000.0;
double ms_p_iter = ms / reps;
std::cout << ms_p_iter << " ms/iteration.";
return 0;
}
Running this on my somewhat "trailing edge" (~5 year-old) machine, I'm getting times around 0.1 ms/iteration. I'd expect searching in this (using std::lower_bound or std::upper_bound) to be somewhat faster than searching in an std::map as well (since the data in the vector is allocated contiguously, we can expect better locality of reference, leading to better cache usage).
Thanks everyone for the responses. I ended up creating a vector of pairs (std::vector of std::pair(long, int)). Then I sorted the vector by the long value. I created a custom comparator that only looked at the first part of the pair. Then I used lower_bound to search for the pair. Here's how I did it all:
typedef std::pair<long,int> Key2VectorIndexPairT;
typedef std::vector<Key2VectorIndexPairT> Key2VectorIndexPairVectorT;
bool Key2VectorIndexPairComparator(const Key2VectorIndexPairT& pair1, const Key2VectorIndexPairT& pair2)
{
return pair1.first < pair2.first;
}
...
Key2VectorIndexPairVectorT sortedVector;
sortedVector.reserve(originalVector.capacity());
// Assume "original" vector contains unsorted elements.
for (int i = 0; i < originalVector.size(); ++i)
{
const TheStruct& theStruct = originalVector[i];
sortedVector.insert(Key2VectorIndexPairT(theStruct.key, i));
}
std::sort(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairComparator);
...
const long keyToSearchFor = 20;
const Key2VectorIndexPairVectorT::const_iterator cItorKey2VectorIndexPairVector = std::lower_bound(sortedVector.begin(), sortedVector.end(), Key2VectorIndexPairT(keyToSearchFor, 0 /* Provide dummy index value for search */), Key2VectorIndexPairComparator);
if (cItorKey2VectorIndexPairVector->first == keyToSearchFor)
{
const int vectorIndex = cItorKey2VectorIndexPairVector->second;
const TheStruct& theStruct = originalVector[vectorIndex];
// Now do whatever you want...
}
else
{
// Could not find element...
}
This yielded a modest performance gain for me. Before the total time for my calculations were 3.75 milliseconds and now it is down to 2.5 milliseconds.

Sorting 1000-2000 elements with many cache misses

I have an array of 1000-2000 elements which are pointers to objects. I want to keep my array sorted and obviously I want to do this as quick as possible. They are sorted by a member and not allocated contiguously so assume a cache miss whenever I access the sort-by member.
Currently I'm sorting on-demand rather than on-add, but because of the cache misses and [presumably] non-inlining of the member access the inner loop of my quick sort is slow.
I'm doing tests and trying things now, (and see what the actual bottleneck is) but can anyone recommend a good alternative to speeding this up?
Should I do an insert-sort instead of quicksorting on-demand, or should I try and change my model to make the elements contigious and reduce cache misses?
OR, is there a sort algorithm I've not come accross which is good for data that is going to cache miss?
Edit: Maybe I worded this wrong :), I don't actually need my array sorted all the time (I'm not iterating through them sequentially for anything) I just need it sorted when I'm doing a binary chop to find a matching object, and doing that quicksort at that time (when I want to search) is currently my bottleneck, because of the cache misses and jumps (I'm using a < operator on my object, but I'm hoping that inlines in release)
Simple approach: insertion sort on every insert. Since your elements are not aligned in memory I'm guessing linked list. If so, then you could transform it into a linked list with jumps to the 10th element, the 100th and so on. This is kind of similar to the next suggestion.
Or you reorganize your container structure into a binary tree (or what every tree you like, B, B*, red-black, ...) and insert elements like you would insert them into a search tree.
Running a quicksort on each insertion is enormously inefficient. Doing a binary search and insert operation would likely be orders of magnitude faster. Using a binary search tree instead of a linear array would reduce the insert cost.
Edit: I missed that you were doing sort on extraction, not insert. Regardless, keeping things sorted amortizes sorting time over each insert, which almost has to be a win, unless you have a lot of inserts for each extraction.
If you want to keep the sort on-extract methodology, then maybe switch to merge sort, or another sort that has good performance for mostly-sorted data.
I think the best approach in your case would be changing your data structure to something logarithmic and rethinking your architecture. Because the bottleneck of your application is not that sorting thing, but the question why do you have to sort everything on each insert and try to compensate that by adding on-demand sort?.
Another thing you could try (that is based on your current implementation) is implementing an external pointer - something mapping table / function and sort those second keys, but I actually doubt it would benefit in this case.
Instead of the array of the pointers you may consider an array of structs which consist of both a pointer to your object and the sort criteria. That is:
Instead of
struct MyType {
// ...
int m_SomeField; // this is the sort criteria
};
std::vector<MyType*> arr;
You may do this:
strcut ArrayElement {
MyType* m_pObj; // the actual object
int m_SortCriteria; // should be always equal to the m_pObj->m_SomeField
};
std::vector<ArrayElement> arr;
You may also remove the m_SomeField field from your struct, if you only access your object via this array.
By such in order to sort your array you won't need to dereference m_pObj every iteration. Hence you'll utilize the cache.
Of course you must keep the m_SortCriteria always synchronized with m_SomeField of the object (in case you're editing it).
As you mention, you're going to have to do some profiling to determine if this is a bottleneck and if other approaches provide any relief.
Alternatives to using an array are std::set or std::multiset which are normally implemented as R-B binary trees, and so have good performance for most applications. You're going to have to weigh using them against the frequency of the sort-when-searched pattern you implemented.
In either case, I wouldn't recommend rolling-your-own sort or search unless you're interested in learning more about how it's done.
I would think that sorting on insertion would be better. We are talking O(log N) comparisons here, so say ceil( O(log N) ) + 1 retrieval of the data to sort with.
For 2000, it amounts to: 8
What's great about this is that you can buffer the data of the element to be inserted, that's how you only have 8 function calls to actually insert.
You may wish to look at some inlining, but do profile before you're sure THIS is the tight spot.
Nowadays you could use a set, either a std::set, if you have unique values in your structure member, or, std::multiset if you have duplicate values in you structure member.
One side note: The concept using pointers, is in general not advisable.
STL containers (if used correctly) give you nearly always an optimized performance.
Anyway. Please see some example code:
#include <iostream>
#include <array>
#include <algorithm>
#include <set>
#include <iterator>
// Demo data structure, whatever
struct Data {
int i{};
};
// -----------------------------------------------------------------------------------------
// All in the below section is executed during compile time. Not during runtime
// It will create an array to some thousands pointer
constexpr std::size_t DemoSize = 4000u;
using DemoPtrData = std::array<const Data*, DemoSize>;
using DemoData = std::array<Data, DemoSize>;
consteval DemoData createDemoData() {
DemoData dd{};
int k{};
for (Data& d : dd)
d.i = k++*2;
return dd;
}
constexpr DemoData demoData = createDemoData();
consteval DemoPtrData createDemoPtrData(const DemoData& dd) {
DemoPtrData dpd{};
for (std::size_t k{}; k < dpd.size(); ++k)
dpd[k] = &dd[k];
return dpd;
}
constexpr DemoPtrData dpd = createDemoPtrData(demoData);
// -----------------------------------------------------------------------------------------
struct Comp {bool operator () (const Data* d1, const Data* d2) const { return d1->i < d2->i; }};
using MySet = std::multiset<const Data*, Comp>;
int main() {
// Add some thousand pointers. Will be sorted according to struct member
MySet mySet{ dpd.begin(), dpd.end() };
// Extract a range of data. integer values between 42 and 52
const Data* p42 = dpd[21];
const Data* p52 = dpd[26];
// Show result
for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
std::cout << (*iptr)->i << '\n';
// Insert a new element
Data d1{ 47 };
mySet.insert(&d1);
// Show again
std::cout << "\n\n";
for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
std::cout << (*iptr)->i << '\n';
}