Do you remember my prior question: What is causing data race in std::async here?
Even though I successfully parallelized this program, it still ran too slowly to be practical.
So I tried to improve the data structure representing a Conway's Game of Life pattern.
Brief explanation of the new structure:
class pattern {
// NDos::Lifecell represents a cell by x and y coordinates.
// NDos::Lifecell is equality comparable, and has std::hash specialization.
private:
std::unordered_map<NDos::Lifecell, std::pair<int, bool>> cells_coor;
std::unordered_set<decltype(cells_coor)::const_iterator> cells_neigh[9];
std::unordered_set<decltype(cells_coor)::const_iterator> cells_onoff[2];
public:
void insert(int x, int y) {
// if coordinate (x,y) isn't already ON,
// turns it ON and increases the neighbor's neighbor count by 1.
}
void erase(int x, int y) {
// if coordinate (x,y) isn't already OFF,
// turns it OFF and decreases the neighbor's neighbor count by 1.
}
pattern generate(NDos::Liferule rule) {
// this advances the generation by 1, according to the rule.
// (For example here, B3/S23)
pattern result;
// inserts every ON cell with 3 neighbors to result.
// inserts every OFF cell with 2 or 3 neighbors to result.
return result;
}
// etc...
};
In brief, pattern contains the cells. It contains every ON cells, and every OFF cells that has 1 or more ON neighbor cells. It can also contain spare OFF cells.
cells_coor directly stores the cells, by using their coordinates as keys, and maps them to their number of ON neighbor cells (stored as int) and whether they are ON (stored as bool).
cells_neigh and cells_onoff indirectly stores the cells, by the iterators to them as keys.
The number of ON neighbor of a cell is always 0 or greater and 8 or less, so cells_neigh is a size 9 array.
cells_neigh[0] stores the cells with 0 ON neighbor cells, cells_neigh[1] stores the cells with 1 ON neighbor cell, and so on.
Likewise, a cell is always either OFF or ON, so cells_onoff is a size 2 array.
cells_onoff[false] stores the OFF cells, and cells_onoff[true] stores the ON cells.
Cells must be inserted to or erased from all of cells_coor, cells_neigh and cells_onoff. In other words, if a cell is inserted to or erased from one of them, it must be so also for the others. Because of this, the elements of cells_neigh and cells_onoff is std::unordered_set storing the iterators to the actual cells, enabling fast access to the cells by a neighbor count or OFF/ON state.
If this structure works, the insertion function will have average time complexity of O(1), the erasure also O(1), and the generation O(cells_coor.size()), which are great improval of time complexity from the prior structure.
But as you see, there is a problem: How can I hash a std::unordered_map::const_iterator?
std::hash prohibits a specialization for them, so I have to make a custom one.
Taking their address won't work, as they are usually acquired as rvalues or temporaries.
Dereferencing them also won't work, as there are multiple cells that have 0 ON neighbor cells, or multiple cells that is OFF, etc.
So what can I do? If I can't do anything, cells_neigh and cells_onoff will be std::vector or something, sharply degrading the time complexity.
Short story: this won't work (really well)(*1). Most of the operations that you're likely going to perform on the map cells_coor will invalidate any iterators (but not pointers, as I learned) to its elements.
If you want to keep what I'd call different "views" on some collection, then the underlying container storing the actual data needs to be either not modified or must not invalidate its iterators (a linked list for example).
Perhaps I'm missing something, but why not keep 9 sets of cells for the neighbor counts and 2 sets of cells for on/off? (*2) Put differently: for what do you really need that map? (*3)
(*1): The map only invalidates pointers and iterators when rehashing occurs. You can check for that:
// Before inserting
(map.max_load_factor() * map.bucket_count()) > (map.size() + 1)
(*2): 9 sets can be reduced to 8: if a cell (x, y) is in none of the 8 sets, then it would be in the 9th set. Thus storing that information is unnecessary. Same for on/off: it's enough to store cells that are on. All other are off.
(*3): Accessing the number of neighbours without using the map but only with sets of cells, kind of pseudo code:
unsigned number_of_neighbours(Cell const & cell) {
for (unsigned neighbours = 9; neighbours > 0; --neighbours) {
if (set_of_cells_with_neighbours(neighbours).count() == 1) {
return neighbours;
}
}
return 0;
}
The repeated lookups in the sets could of course destroy actual performance, you'd need to profile that. (Asymptotic runtime is unaffected)
Related
I was set a homework challenge as part of an application process (I was rejected, by the way; I wouldn't be writing this otherwise) in which I was to implement the following functions:
// Store a collection of integers
class IntegerCollection {
public:
// Insert one entry with value x
void Insert(int x);
// Erase one entry with value x, if one exists
void Erase(int x);
// Erase all entries, x, from <= x < to
void Erase(int from, int to);
// Return the count of all entries, x, from <= x < to
size_t Count(int from, int to) const;
The functions were then put through a bunch of tests, most of which were trivial. The final test was the real challenge as it performed 500,000 single insertions, 500,000 calls to count and 500,000 single deletions.
The member variables of IntegerCollection were not specified and so I had to choose how to store the integers. Naturally, an STL container seemed like a good idea and keeping it sorted seemed an easy way to keep things efficient.
Here is my code for the four functions using a vector:
// Previous bit of code shown goes here
private:
std::vector<int> integerCollection;
};
void IntegerCollection::Insert(int x) {
/* using lower_bound to find the right place for x to be inserted
keeps the vector sorted and makes life much easier */
auto it = std::lower_bound(integerCollection.begin(), integerCollection.end(), x);
integerCollection.insert(it, x);
}
void IntegerCollection::Erase(int x) {
// find the location of the first element containing x and delete if it exists
auto it = std::find(integerCollection.begin(), integerCollection.end(), x);
if (it != integerCollection.end()) {
integerCollection.erase(it);
}
}
void IntegerCollection::Erase(int from, int to) {
if (integerCollection.empty()) return;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
/* std::vector::erase deletes entries between the two pointers
fromBound (included) and toBound (not indcluded) */
integerCollection.erase(fromBound, toBound);
}
size_t IntegerCollection::Count(int from, int to) const {
if (integerCollection.empty()) return 0;
int count = 0;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
// increment pointer until fromBound == toBound (we don't count elements of value = to)
while (fromBound != toBound) {
++count; ++fromBound;
}
return count;
}
The company got back to me saying that they wouldn't be moving forward because my choice of container meant the runtime complexity was too high. I also tried using list and deque and compared the runtime. As I expected, I found that list was dreadful and that vector took the edge over deque. So as far as I was concerned I had made the best of a bad situation, but apparently not!
I would like to know what the correct container to use in this situation is? deque only makes sense if I can guarantee insertion or deletion to the ends of the container and list hogs memory. Is there something else that I'm completely overlooking?
We cannot know what would make the company happy. If they reject std::vector without concise reasoning I wouldn't want to work for them anyway. Moreover, we dont really know the precise requirements. Were you asked to provide one reasonably well performing implementation? Did they expect you to squeeze out the last percent of the provided benchmark by profiling a bunch of different implementations?
The latter is probably too much for a homework challenge as part of an application process. If it is the first you can either
roll your own. It is unlikely that the interface you were given can be implemented more efficiently than one of the std containers does... unless your requirements are so specific that you can write something that performs well under that specific benchmark.
std::vector for data locality. See eg here for Bjarne himself advocating std::vector rather than linked lists.
std::set for ease of implementation. It seems like you want the container sorted and the interface you have to implement fits that of std::set quite well.
Let's compare only isertion and erasure assuming the container needs to stay sorted:
operation std::set std::vector
insert log(N) N
erase log(N) N
Note that the log(N) for the binary_search to find the position to insert/erase in the vector can be neglected compared to the N.
Now you have to consider that the asymptotic complexity listed above completely neglects the non-linearity of memory access. In reality data can be far away in memory (std::set) leading to many cache misses or it can be local as with std::vector. The log(N) only wins for huge N. To get an idea of the difference 500000/log(500000) is roughly 26410 while 1000/log(1000) is only ~100.
I would expect std::vector to outperform std::set for considerably small container sizes, but at some point the log(N) wins over cache. The exact location of this turning point depends on many factors and can only reliably determined by profiling and measuring.
Nobody knows which container is MOST efficient for multiple insertions / deletions. That is like asking what is the most fuel-efficient design for a car engine possible. People are always innovating on the car engines. They make more efficient ones all the time. However, I would recommend a splay tree. The time required for a insertion or deletion is a splay tree is not constant. Some insertions take a long time and some take only a very a short time. However, the average time per insertion/deletion is always guaranteed to be be O(log n), where n is the number of items being stored in the splay tree. logarithmic time is extremely efficient. It should be good enough for your purposes.
The first thing that comes to mind is to hash the integer value so single look ups can be done in constant time.
The integer value can be hashed to compute an index in to an array of bools or bits, used to tell if the integer value is in the container or not.
Counting and and deleting large ranges could be sped up from there, by using multiple hash tables for specific integer ranges.
If you had 0x10000 hash tables, that each stored ints from 0 to 0xFFFF and were using 32 bit integers you could then mask and shift the upper half of the int value and use that as an index to find the correct hash table to insert / delete values from.
IntHashTable containers[0x10000];
u_int32 hashIndex = (u_int32)value / 0x10000;
u_int32int valueInTable = (u_int32)value - (hashIndex * 0x10000);
containers[hashIndex].insert(valueInTable);
Count for example could be implemented as so, if each hash table kept count of the number of elements it contained:
indexStart = startRange / 0x10000;
indexEnd = endRange / 0x10000;
int countTotal = 0;
for (int i = indexStart; i<=indexEnd; ++i) {
countTotal += containers[i].count();
}
Not sure if using sorting really is a requirement for removing the range. It might be based on position. Anyway, here is a link with some hints which STL container to use.
In which scenario do I use a particular STL container?
Just FYI.
Vector maybe a good choice, but it does a lot of re allocation, as you know. I prefer deque instead, as it doesn't require big chunk of memory to allocate all items. For such requirement as you had, list probably fit better.
Basic solution for this problem might be std::map<int, int>
where key is the integer you are storing and value is the number of occurences.
Problem with this is that you can not quickly remove/count ranges. In other words complexity is linear.
For quick count you would need to implement your own complete binary tree where you can know the number of nodes between 2 nodes(upper and lower bound node) because you know the size of tree, and you know how many left and right turns you took to upper and lower bound nodes. Note that we are talking about complete binary tree, in general binary tree you can not make this calculation fast.
For quick range remove I do not know how to make it faster than linear.
Which is the best away to create and store cycles using c/c++?
I have the structs:
struct CYCLE {
vector<Arc> route;
float COST;
}
struct Arc {
int i, j;
Arc () {};
Arc (const Arc& obj): i(obj.i), j(obj.j) {};
Arc(int _i, int _j) : i(_i), j(_j) {}
};
To store the cycles that have already been created, I thought about using:
vector<CYCLE> ConjCycles;
For each cycle created, I need to verify if this cycle has not yet been added to the ConjCycles.
The cycle: 1-2-2-1; is the same as the cycle: 2-2-1-2.
How can I detect that cycles like those are the same?
I thought about using a map to control this.
However, I don't know how to set a key to the cycle, so that the two cycles described above have the same key.
You have quite a lot of redundancy in your cycle representation, e. g. for a cycle 1-3-2-4-1:
{ (1, 3), (3, 2), (2, 4), (4, 1) }
If we consider a cycle being a cyclic graph, then you store the edges in your data structure. It would be more efficient to store the vertices instead:
struct Cycle
{
std::vector<int> vertices;
};
The edges you get implicitly from vertices[n] and vertices[n + 1]; the last vertex is always the same as the first one, so do not store it explicitly, the last edge then will be vertices[vertices.size() - 1], vertices[0].
Be aware that this is only internal representation; you still can construct the cycle from a sequence of edges (Arcs). You'd most likely check the sequence in the constructor and possibly throw an exception, if it is invalid (there are alternatives, though, if you dislike exceptions...).
Then you need some kind of equivalence. My proposition would be:
if the number of vertices is not equal, the cycles cannot be equal.
it might shorten the rest of the algorithm (but that would yet have to be evaluated!), if you count the number of occurrences for each vertex id, these must match
search the minimum vertex id for each cycle, from this on, compare each subsequent value, wrapping around in the vector, if the end is reached.
if sequences match, your done; this does not yet cover the case that there are multiple minimum values, though; if this happens, you might just repeat the step trying the next minimum value in one cycle, staying with the same in the other. You might try to do the same in parallel with the maxima, or if you have counted them anyway (see above), use of minima/maxima the ones with less elements.
Edit: Further improvement (idea inspired by [Scheff]'s comment to the question):
Instead of re-trying each minimum found, we preferably should select some kind of absolute minimum from the relative minima found so far; a relative minimum x is smaller than a relative minimum y if the successor of x is smaller than the successor of y; if both successors are equal, look at the next successors, and so on. If you discover more than one absolute minimum (if some indirect successor gets equal to the initial minium), then you have a sequence some sub-cycle repeating itself multiple times (1-2-3-1-2-3-1-2-3). Then it does not matter, which "absolute" minimum you select...
You'd definitely skip step 2 above then, though.
Find the minimum already in the constructor and store it. Then comparison gets easy, you just start in both cycles at their respective minimum...
I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.
My main question is how I can easily swap objects from one vector to another in C++. So adding an object to one vector and removing it from another.
To be more precise: I'm trying to iterate over a grid of cells in the following manner:
Add all cells to an unknownset (and one start cell to the knownset)
Add the neighbors of the known cell to the candidateset (and remove from the unknownset)
Pick the cell in the candidateset with the lowest value and add it to the knownset (and remove it from the candidateset)
Pick this lowest cell and return to step 2
If the unknownset is empty; quit.
Sloppy pseudocode:
vector<Cell> knownset = vector<Cell>();
vector<Cell> unknownset = vector<Cell>();
vector<Cell> candidateset = vector<Cell>();
Cell currentCell = some_cell;
// Iterate until all cells are known
while (unknownset.size() > 0){
for each (direction in directions){
c = currentCell+direction;
// Add cell to candidate set
candidateset.push_back(c);
// Remove cell from unknown set
unknownset.remove(c);
// Search the cell with the lowest value
for each( candidate in candidateset ){
if ( candidate.value < lowestValue ){
lowestValue = candidate.value;
lowestCell = candidate;
}
}
// Remove the cell with the lowest value
knownset.push_back(lowestCell);
candidateset.remove(lowestCell);
currentCell = lowestCell;
}
}
Does anyone have any suggestions how to easily swap cells in this way? (the grid is quite large, so any tips for the performance are also welcome)
The code you have is it. std::vector stores all its content contiguously, so your "swap" of necessity requires copying data around.
You might be eligible for a move optimization, except that you are keeping currentCell. Still, you could move from the location you are about to remove, to a local, and from there to the new location. It is unclear whether your Cell object has a move operation that is cheaper than a copy, so this may not matter to you.
If you want efficient element removal, you should use a datastructure that is optimized for element removal. If you also need to remember insertion order, then std::list might be good. If ordering is not needed, then std::unordered_set.
From your algorithm description, I suspect std::unordered_set is the best choice.
If you want to swap contents of two vectors v1 and v2, you can use
v1.swap(v2)
from algorithm library.
I need to store W items. Each item has a 'string' attribute and a 'double' attribute (the item's score) associated with it. In each iteration, additional C items are added to the set. After the iteration is complete, score of some of the items is updated by a small amount. Now, out of the W+C items only W items need to be taken forward to the next iteration. Highest Scoring 'W' items will be selected that will go to the next generation.
In every iteration a different set of 'C' items are added.
W is of the order of 10,000. C is of the order of 600.
What is the best data structure to use this in terms of time complexity. Hash Table, Heap, Binary Search Tree??
I am using C++. Some boost references will be appreciated
I would store these values in two parallel structures. First, have an array of the double values, each of which stores a pointer. Next, store all the strings in a hash table along with an auxiliary integer. The idea is that the pointers in the array point to the nodes in the hash table or trie holding the string associated with the double, while the integer value with each string stores the index of the double paired with that string.
To insert a string/double pair into this structure, you add the string to the hash table, append the double to the array, then store a pointer to the new string in the array and the index of the double in the hash table. This has complexity O(k), where k is the length of the string.
To change a priority, look up the string in the hash table, then get the index of the double in the array. You can then modify that element to change tye associated priority. This also has complexity O(k).
To discard all but the top B key/value pairs, run a selection algorithm on the array to put the top B elements in one part of the array and the remaining C elements in the other. Whenever you perform a swap, follow the pointers out of the array and into the hash table and update the indices of the elements you just swapped. Finally, iterate across the last C elements of the array, follow their pointers back into the hash table, and remove the elements they point at from the table. This takes expected O(n) time to do the selection step, or worst-case O(n) time using the median-of-medians algorithm, followed by O(n) time to remove the elements from the hash table, for an expected runtime of O(n), where n is the number of elements in the structure.
To summarize, this gives you O(k) insertion and lookup of any string, where k is the string length, and O(n) retaining of the best elements, where n is the total number of elements.
Well, I think you will be fine just using a std::vector<Item> and doing a std::nth_element (on the score) once at end of iteration. E.g. if you want to keep 10000 items, do like this:
struct Item {
double score;
std::string name;
};
bool comparator(const Item& a, const Item& b) {
return a.score > b.score;
};
if (items.size() > 10000) {
// Make sure the 10,000 first elements contain the highest scores.
items.nth_element(item.begin(), item.begin() + 10000, item.end(),
comparator);
// Only keep the first 10,000 elements.
items.resize(10000);
}
Actually, if you do it like this, updating values (by linear search and string comparison) will probably be slower than sorting. You can speed up the comparisons by putting a string hash into your Item instead of the pure strings.
If you want even faster updating: Before updating, sort items on string hash. Then you can do a binary search instead of linear search to find the item you want to update.