Validity of std::map::iterator after erasing elements

Validity of std::map::iterator after erasing elements - c++

I have written a code for solving the following problem: We have a map<double,double> with (relatively) huge number of items. We want to merge the adjacent items in order to reduce the size of the map keeping a certain "loss factor" as low as possible.
To do so, I first populate a list containing adjacent iterators and the associated loss factor (let's say each list element has the following type:
struct myPair {
map<double,double>::iterator curr, next;
double loss;
myPair(map<double,double>::iterator c, map<double,double>::iterator n,
double l): curr(c), next(n), loss(l) {}
};
). This is done as follows:
for (map<double,double>::iterator it1 = myMap.begin(); it1 != --(myMap.end());
it1++) {
map<double,double>::iterator it2 = it1; it2++;
double l = computeLoss(it1,it2);
List.push(myPair(it1,it2,l));
}
Then, I find the list element corresponding to the lowest loss factor, erase the corresponding elements from the map and insert a new element (result of merging curr and next) in the map. Since this also changes the list elements corresponding to the element after next or before curr I update the corresponding entries and also the associated loss factor.
(I don't get into the details of how to implement the above efficiently but basically I am combining a double linked list and a heap).
While the erase operations should not invalidate the remaining iterators for some specific input instances of the program I get the double free or corruption error exactly at the point where I attempt to erase the elements from the map.
I tried to track this and it seems this happens when both first and second entries of the two map elements are very close (more precisely when the firsts of curr and next are very close).
A strange thing is that I put an assert while populating the list to ensure that in all entries curr and next are different and the same assert in the loop of removing elements. The second one fails!
I would appreciate if anyone can help me.
P.S. I am sorry for not being very precise but I wanted to keep the details as low as possible.
UPDATE: This is (a very simplified version of) how I erase the elements from the map:
while (myMap.size() > MAX_SIZE) {
t = list.getMin();
/* compute the merged version ... let's call the result as (a,b) */
myMap.erase(t.curr);
myMap.erase(t.next);
myMap.insert(pair<double,double>(a,b));
/* update the adjacent entries */
}

Stored iterators in myPair stay invalid after container modification. You should avoid such technique. Probably when you look into header file you will find some ready drafts for your task?

As mentioned already by the other people, it turns out that using double as the key of the map is problematic. In particular when the values are computed.
Hence, my solution was to use std::multimap instead of map (and then merge the elements with the same key just after populating the map). With this, for example even if a is very close to both keys of t.curr and t.next or any other element, for sure the insert operation creates a new element such that no existing iterator in the list would point to that.

Related

Stable sort a C++ hash map - preserve the insertion order for equal elements

Say I have a std::unordered_map<std::string, int> that represents a word and the number of times that word appeared in a book, and I want to be able to sort it by the value.
The problem is, I want the sorting to be stable, so that in case two items have equal value I want the one who got inserted first to the map to be first.
It is simple to implement it by adding addition field that will keep the time it got inserted. Then, create a comperator that uses both time and the value. Using simple std::sort will give me O(Nlog(N)) time complexity.
In my case, space is not an issue whenever time can be improved. I want to take advantage of it and do a bucket sorting. Which should give me O(N) time complexity. But when using bucket sorting, there is no comperator, when iterating the items in the map the order is not preserved.
How can I both make it stable and still keep the O(N) time complexity via bucket sorting or something else?
I guess that if I had some kind of hash map that preserves the order of insertion while iterating it, it would solve my issue.
Any other solutions with the same time complexity are acceptable.
Note - I already saw this and that and due to the fact that they are both from 2009 and that my case is more specific I think, I opened this question.

Here is a possible solution I came up with using an std::unordered_map and tracking the order of inserting using a std::vector.
Create a hash map with the string as key and count as value.
In addition, create a vector with iterators to that map type.
When counting elements, if the object is not yet in the map, add to both map and vector. Else, just increment the counter. The vector will preserve the order the elements got inserted to the map, and the insertion / update will still be in O(1) time complexity.
Apply bucket sort by iterating over the vector (instead of the map), this ensures the order is preserved and we'll get a stable sort. O(N)
Extract from the buckets to make a sorted array. O(N)
Implementation:
unordered_map<std::string, int> map;
std::vector<std::unordered_map<std::string,int>::iterator> order;
// Lets assume this is my string stream
std::vector<std::string> words = {"a","b","a" ... };
// Insert elements to map and the corresponding iterator to order
for (auto& word : words){
auto it = map.emplace(word,1);
if (!it.second){
it.first->second++;
}
else {
order.push_back(it.first);
}
max_count = std::max(max_count,it.first->second);
}
// Bucket Sorting
/* We are iterating over the vector and not the map
this ensures we are iterating by the order they got inserted */
std::vector<std::vector<std::string>> buckets(max_count);
for (auto o : order){
int count = o->second;
buckets[count-1].push_back(o->first);
}
std::vector<std::string> res;
for (auto it = buckets.rbegin(); it != buckets.rend(); ++it)
for (auto& str : *it)
res.push_back(str);

Storing and managing std::list::iterator

Context: I am implementing the Push-Relable Algorithm for MaxFlow in a network and want to keep track of labels of all nodes, for each possible label (2*V-1 many) I want to have a doubly-linked list containing the nodes with that label.
So I have a vector where each entry is a list. Now I need to delete an element from one list and move it into another list in another vector-entry.
In order to do so, I use an vector (wich size is equal to the number of elements) where each entry is an interator, so I always know the position of each element.
Before implementing it on a bigger scale I wanted to try whether it works at all. So I create the two vectors, add one element into a list, store the iterator in the other vector and try to delete that element again.
But the std::vector::erase() method always gets me SegFaults. Did I miss something?
int V=50;
int i=0, v=42;
vector<list<int> > B(2*V-1);
vector<list<int>::iterator> itstorage(V) ;
B[i].push_back(v);
itstorage[v]=B[i].end();
B[i].erase(itstorage[v]);

B[i].end() does not refer to the last item you pushed, it is one past the item you pushed.
What you want is:
std::list<int>::iterator p = B[i].end();
--p;
Alternatively, instead of using push_back, you could use the insert member function which returns an iterator to the newly inserted item.
itstorage[v] = B[i].insert(B[i].end(), v);

C/C++ - Efficient way to compare two lists and find missing elements

I have two lists, L1 and L2, of data containing multiple elements, each unique, of an abstract data type (ie: structs). Each of the two lists:
May contain between zero and one-hundred (inclusive) elements.
Contains no duplicate elements (each element is unique).
May or may not contain elements in the other list (ie: L1 and L2 might be identical, or contain completely different elements).
Is not sorted.
At the lowest level, is stored withing a std::vector<myStruct> container.
What I am typically expecting is that periodically, a new element is added to L2, or an element is subtracted/removed from it. I am trying to detect the differences in the two lists as efficiently (ie: with minimal comparisons) as possible:
If an entry is not present in L2 and is present in L1, carry out one operation: Handle_Missing_Element().
If an entry is present in L2 and not present in L1, carry out another operation: Handle_New_Element().
Once the above checks are carried out, L1 is set to be equal to L2, and at some time in the future, L2 is checked again.
How could I go about finding out the differences between the two lists? There are two approaches I can think of:
Compare both lists via every possible combination of elements. Possibly O(n2) execution complexity (horrible).
bool found;
for i in 1 .. L2->length()
found = false;
for j in 1 .. L1->length()
if (L1[j] == L2[i]
// Found duplicate entry
found = true;
fi
endfor
endfor
Sort the lists, and compare the two lists element-wise until I find a difference. This seems like it would be in near-linear time. The problem is that I would need the lists to be sorted. It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.
Is there a straightforward way to accomplish this efficiently in C++? I've found similar such problems, but I need to do more than just find the intersection of two sets, or do such a test with just a set of integers, where sum-related tricks can be used, as I need to carry out different operations for "new" vs "missing" elements.
Thank you.

It would be impractical to manually sort the underlying vector after
each addition/removal for the list. It would only be reasonable to do
this if it were somehow possible to force vector::push_back() to
automatically insert elements such that insertions preseve the sorting
of the list.
What you're talking about here is an ordered insert. There are functions in <algorithm> that allow you do do this. Rather than using std::vector::push_back you would use std::vector::insert, and call std::lower_bound which does a binary search for the first element not less than than a given value.
auto insert_pos = std::lower_bound( L2.begin(), L2.end(), value );
if( insert_pos == L2.end() || *insert_pos != value )
{
L2.insert( insert_pos, value );
}
This makes every insertion O(logN) but if you are doing fewer than N insertions between your periodic checks, it ought to be an improvement.
The zipping operation might look something like this:
auto it1 = L1.begin();
auto it2 = L2.begin();
while( it1 != L1.end() && it2 != L2.end() )
{
if( *it1 < *it2 ) {
Handle_Missing( *it1++ );
} else if( *it2 < *it1 ) {
Handle_New( *it2++ );
} else {
it1++;
it2++;
}
}
while( it1 != L1.end() ) Handle_Missing( *it1++ );
while( it2 != L2.end() ) Handle_New( *it2++ );

Can you create a hash value for your list items? If so, just compute the hash and check the hash table for the other list. This is quick, does not require sorting, and prevents your "every possible combination" problem. If your're using C++ and the STL you could use a map container to hold each list.
Create a hash for each item in L1, and use map to map it associate it with your list item.
Create a similar map for L2, and as each L2 has is created check to see if it's in the L1 map.
When a new element is added to L2, calculate its hash value and check to see if it's in the L1 hash map (using map.find() if using STL maps). If not then carry out your Handle_New_Element() function.
When an element is subtracted from the L2 list and it's hash is not in the L1 hash map then carry out your Handle_Missing_Element() function.

A container that automatically sorts itself on inserts is std::set. Insertions will be O(log n), and comparing the two sets will be O(n). Since all your elements are unique you don't need std::multiset.

For each element of both arrays maintain number of times it is met in the opposite array. You can store these numbers in separate arrays with same indexing, or in the structs you use.
When an element x is inserted into L2, you have to check it for equality with all the elements of L1. On each equality with y, increment counters of both elements x and y.
When an element x is removed from L2, you have to again compare it with all the elements of L1. On each equality with y from L1, decrement counter of y. Counter of x does not matter, since it is removed.
When you want to find non-duplicate elements, you can simply iterate over both arrays. The elements with zero counters are the ones you need.
In total, you need O(|L1|) additional operations per insert and remove, and O(|L1| + |L2|) operations per duplication search. The latter can be reduced to the number of sought-for non-duplicate elements, if you additionally maintain lists of all elements with zero counter.
EDIT: Ooops, it seems that each counter is always either 0 or 1 because of uniqueness in each list.
EDIT2: As Thane Plummer has written, you can additionally use hash table. If you create a hash table for L1, then you can do all the comparisons in insert and remove in O(1). BTW since your L1 is constant, you can even create a perfect hash table for it to make things faster.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

How to do fast sorting in sorted list when only one element is changed

I need a list of elements that are always sorted. the operation involved is quite simple, for example, if the list is sorted from high to low, i only need three operations in some loop task:
while true do {
list.sort() //sort the list that has hundreds of elements
val = list[0] //get the first/maximum value in the list
list.pop_front() //remove the first/maximum element
...//do some work here
list.push_back(new_elem)//insert a new element
list.sort()
}
however, since I only add one elem at a time, and I have speed concern, I don't want the sorting go through all the elements, e.g., using bubble sorting. So I just wonder if there is a function to insert the element in order? or whether the list::sort() function is smarter enough to use some kind of quick sort when only one element is added/modified?
Or maybe should I use deque for better speed performance if above are all the operations needed?
thanks alot!

As mentioned in the comments, if you aren't locked into std::list then you should try std::set or std::multiset.
The std::list::insert method takes an iterator which specifies where to add the new item. You can use std::lower_bound to find the correct insertion point; it's not optimal without random access iterators but it still only does O(log n) comparisons.
P.S. don't use variable names that collide with built-in classes like list.
lst.sort(std::greater<T>()); //sort the list that has hundreds of elements
while true do {
val = lst.front(); //get the first/maximum value in the list
lst.pop_front(); //remove the first/maximum element
...//do some work here
std::list<T>::iterator it = std::lower_bound(lst.begin(), lst.end(), std::greater<T>());
lst.insert(it, new_elem); //insert a new element
// lst is already sorted
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js