I have a std::set and I need to erase similar adjacent elements:
DnaSet::const_iterator next = dna_list.begin();
DnaSet::const_iterator actual = next;
++next;
while(next != dna_list.end()) // cycle over pairs, dna_list is the set
{
if (similar(*actual, *next))
{
Dna dna_temp(*actual); // copy constructor
dna_list.erase(actual); // erase the old one
do
{
dna_temp.mutate(); // change dna_temp
} while(!dna_list.insert(dna_temp).second); // insert dna_temp
}
++actual;
++next;
}
Sometimes the program can't exit from the main loop. I think the problem happens when I erase the last element in the dna_list. What's the correct way to do this task?
Use actual = next rather than ++actual.
Once you erase actual, it is an invalid iterator, so ++actual will behave strangely. next should remain intact, so assigning actual to next should work.
Your best option is to create a comparison functor that uses the similar() predicate. Then all you need to do is construct the set with that comparison functor and you're done. The set itself will see two similar elements as identical and will only let the first one in.
struct lt_different {
bool operator()(int a, int b) {
return a < b && !similar(a, b);
}
private:
bool similar(int a, int b)
{
// TODO:when are two elements similar?
const int EPSILON = 2;
return abs(a - b) < EPSILON;
}
};
// ...
set<int> o; // fill this set with your data
// copy your data to a new set that rejects similar elements
set<int,lt_different> s(o.begin(), o.end(), lt_different());
You can work with set s: insert elements, remove elements, modify elements -- and the set itself will make sure no two similar elements exist in the set.
That said, you can also write an algorithm yourself, if only for an alternative choice. Take a look at std::adjacent_find() from <algorithm>. It will find the first occurrence of two consecutive identical elements; hold on to that position. With that found, find the first element from that point that is different from these elements. You end up with two iterators that denote a range of consecutive, similar elements. You can use the set's erase() method to remove them, as it has an overload that takes two iterators.
Lather, rinse, repeat for the entire set.
Related
I have a multi-set containing pointers to custom types. I have provided a custom sorter to the multi-set that compares on a particular attribute of the custom type.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering? Or anytime I create an iterator (or a foreach loop), I will still get the items in order?
I can make a quick test for myself, but I wanted to know if the behavior would be consistent on any platform and compiler or if it is standard.
Edit: Here is an example I tried. I noticed two things.
In a multi-set if I change the value that is used to compare before removing the key, I can no longer remove it. Otherwise, my original thought of removing and reinserting seems the best way for this to work.
#include <stdio.h>
#include <set>
struct NodePointerCompare;
struct Node {
int priority;
};
struct NodePointerCompare {
bool operator()(const Node* lhs, const Node* rhs) const {
return lhs->priority < rhs->priority;
}
};
int main()
{
Node n1{1};
Node n2{2};
Node n3{3};
std::multiset<Node*, NodePointerCompare> nodes;
nodes.insert(&n1);
nodes.insert(&n2);
nodes.insert(&n3);
printf("First round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 10;
printf("Second round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 1;
printf("Third round\n");
nodes.erase(&n1);
n1.priority = 10;
nodes.insert(&n1);
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
return 0;
}
This is the output I get
First round
1
2
3
Second round
10
2
3
Third round
2
3
10
http://eel.is/c++draft/associative.reqmts#general-3
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
It is simply illegal to change the change the object in a way that affects how it compares to other objects within the associative container.
If you want to do that, you have to get the object out of the container, apply the change to it, and put it back in. Have a look at https://en.cppreference.com/w/cpp/container/multiset/extract if that's what you want to do.
When is a multiset sorted? Insertion, iteration, both?
The standard doesn't specify explicitly, but practically speaking the ordering must be established on insertion.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering?
You may not change the ordering of an element while it is in the set.
However, instead of erase + insert element with different walue, you can extract + modify + re-insert which should be slightly more efficient (or significantly, depending on the element type).
Here is an example I tried.
The behaviour of the example is undefined.
The container must remain sorted at all times because begin has constant complexity. Changing the comparison order of elements in the container is undefined behavior per [associative.reqmts.general]/3 (and [res.on.functions]/2.3):
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
You can use node handles to efficiently modify elements by temporarily removing them from the container, although for elements that are just pointers the only efficiency is avoiding a memory (de)allocation.
I have a C++11 list of complex elements that are defined by a structure node_info. A node_info element, in particular, contains a field time and is inserted into the list in an ordered fashion according to its time field value. That is, the list contains various node_info elements that are time ordered. I want to remove from this list all the nodes that verify some specific condition specified by coincidence_detect, which I am currently implementing as a predicate for a remove_if operation.
Since my list can be very large (order of 100k -- 10M elements), and for the way I am building my list this coincidence_detect condition is only verified by few (thousands) elements closer to the "lower" end of the list -- that is the one that contains elements whose time value is less than some t_xv, I thought that to improve speed of my code I don't need to run remove_if through the whole list, but just restrict it to all those elements in the list whose time < t_xv.
remove_if() though does not seem however to allow the user to control up to which point I can iterate through the list.
My current code.
The list elements:
struct node_info {
char *type = "x";
int ID = -1;
double time = 0.0;
bool spk = true;
};
The predicate/condition for remove_if:
// Remove all events occurring at t_event
class coincident_events {
double t_event; // Event time
bool spk; // Spike condition
public:
coincident_events(double time,bool spk_) : t_event(time), spk(spk_){}
bool operator()(node_info node_event){
return ((node_event.time==t_event)&&(node_event.spk==spk)&&(strcmp(node_event.type,"x")!=0));
}
};
The actual removing from the list:
void remove_from_list(double t_event, bool spk_){
// Remove all events occurring at t_event
coincident_events coincidence(t_event,spk_);
event_heap.remove_if(coincidence);
}
Pseudo main:
int main(){
// My list
std::list<node_info> event_heap;
...
// Populate list with elements with random time values, yet ordered in ascending order
...
remove_from_list(0.5, true);
return 1;
}
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for cycle as suggested for example in this post?
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for loop?
Yes and yes. Don't fight to use code that is preventing you from reaching your goals. Keep it simple. Loops are nothing to be ashamed of in C++.
First thing, comparing double exactly is not a good idea as you are subject to floating point errors.
You could always search the point up to where you want to do a search using lower_bound (I assume you list is properly sorted).
The you could use free function algorithm std::remove_if followed by std::erase to remove items between the iterator returned by remove_if and the one returned by lower_bound.
However, doing that you would do multiple passes in the data and you would move nodes so it would affect performance.
See also: https://en.cppreference.com/w/cpp/algorithm/remove
So in the end, it is probably preferable to do you own loop on the whole container and for each each check if it need to be removed. If not, then check if you should break out of the loop.
for (auto it = event_heap.begin(); it != event_heap.end(); )
{
if (coincidence(*it))
{
auto itErase = it;
++it;
event_heap.erase(itErase)
}
else if (it->time < t_xv)
{
++it;
}
else
{
break;
}
}
As you can see, code can easily become quite long for something that should be simple. Thus, if you need to do that kind of algorithm often, consider writing you own generic algorithm.
Also, in practice you might not need to do a complete search for the end using the first solution if you process you data in increasing time order.
Finally, you might consider using an std::set instead. It could lead to simpler and more optimized code.
Thanks. I used your comments and came up with this solution, which seemingly increases speed by a factor of 5-to-10.
void remove_from_list(double t_event,bool spk_){
coincident_events coincidence(t_event,spk_);
for(auto it=event_heap.begin();it!=event_heap.end();){
if(t_event>=it->time){
if(coincidence(*it)) {
it = event_heap.erase(it);
}
else
++it;
}
else
break;
}
}
The idea to make erase return it (as already ++it) was suggested by this other post. Note that in this implementation I am actually erasing all list elements up to t_event value (meaning, I pass whatever I want for t_xv).
My apologies for the lengthy explanation.
I am working on a C++ application that loads two files into two 2D string vectors, rearranges those vectors, builds another 2D string vector, and outputs it all in a report. The first element of the two vectors is a code that identifies the owner of the item and the item in the vector. I pass the owner's identification to the program on start and loop through the two vectors in a nested while loop to find those that have matching first elements. When I do, I build a third vector with components of the first two, and I then need to capture any that don't match.
I was using the syntax "vector.erase(vector.begin() + i)" to remove elements from the two original arrays when they matched. When the loop completed, I had my new third vector, and I was left with two vectors that only had elements, which didn't match and that is what I needed. This was working fine as I tried the various owners in the files (the program accepts one owner at a time). Then I tried one that generated an out of range error.
I could not figure out how to do the erase inside of the loop without throwing the error (it didn't seem that swap and pop or erase-remove were feasible solutions). I solved my problem for the program with two extra nested while loops after building my third vector in this one.
I'd like to know how to make the erase method work here (as it seems a simpler solution) or at least how to check for my out of range error (and avoid it). There were a lot of "rows" for this particular owner; so debugging was tedious. Before giving up and going on to the nested while solution, I determined that the second erase was throwing the error. How can I make this work, or are my nested whiles after the fact, the best I can do? Here is the code:
i = 0;
while (i < AIvector.size())
{
CHECK:
j = 0;
while (j < TRvector.size())
{
if (AIvector[i][0] == TRvector[j][0])
{
linevector.clear();
// Add the necessary data from both vectors to Combo_outputvector
for (x = 0; x < AIvector[i].size(); x++)
{
linevector.push_back(AIvector[i][x]); // add AI info
}
for (x = 3; x < TRvector[j].size(); x++) // Don't need the the first three elements; so start with x=3.
{
linevector.push_back(TRvector[j][x]); // add TR info
}
Combo_outputvector.push_back(linevector); // build the combo vector
// then erase these two current rows/elements from their respective vectors, this revises the AI and TR vectors
AIvector.erase(AIvector.begin() + i);
TRvector.erase(TRvector.begin() + j);
goto CHECK; // jump from here because the erase will have changed the two increments
}
j++;
}
i++;
}
As already discussed, your goto jumps to the wrong position. Simply moving it out of the first while loop should solve your problems. But can we do better?
Erasing from a vector can be done cleanly with std::remove and std::erase for cheap-to-move objects, which vector and string both are. After some thought, however, I believe this isn't the best solution for you because you need a function that does more than just check if a certain row exists in both containers and that is not easily expressed with the erase-remove idiom.
Retaining the current structure, then, we can use iterators for the loop condition. We have a lot to gain from this, because std::vector::erase returns an iterator to the next valid element after the erased one. Not to mention that it takes an iterator anyway. Conditionally erasing elements in a vector becomes as simple as
auto it = vec.begin()
while (it != vec.end()) {
if (...)
it = vec.erase(it);
else
++it;
}
Because we assign erase's return value to it we don't have to worry about iterator invalidation. If we erase the last element, it returns vec.end() so that doesn't need special handling.
Your second loop can be removed altogether. The C++ standard defines functions for searching inside STL containers. std::find_if searches for a value in a container that satisfies a condition and returns an iterator to it, or end() if it doesn't exist. You haven't declared your types anywhere so I'm just going to assume the rows are std::vector<std::string>>.
using row_t = std::vector<std::string>;
auto AI_it = AIVector.begin();
while (AI_it != AIVector.end()) {
// Find a row in TRVector with the same first element as *AI_it
auto TR_it = std::find_if (TRVector.begin(), TRVector.end(), [&AI_it](const row_t& row) {
return row[0] == (*AI_it)[0];
});
// If a matching row was found
if (TR_it != TRVector.end()) {
// Copy the line from AIVector
auto linevector = *AI_it;
// Do NOT do this if you don't guarantee size > 3
assert(TR_it->size() >= 3);
std::copy(TR_it->begin() + 3, TR_it->end(),
std::back_inserter(linevector));
Combo_outputvector.emplace_back(std::move(linevector));
AI_it = AIVector.erase(AI_it);
TRVector.erase(TR_it);
}
else
++AI_it;
}
As you can see, switching to iterators completely sidesteps your initial problem of figuring out how not to access invalid indices. If you don't understand the syntax of the arguments for find_if search for the term lambda. It is beyond the scope if this answer to explain what they are.
A few notable changes:
linevector is now encapsulated properly. There is no reason for it to be declared outside this scope and reused.
linevector simply copies the desired row from AIVector rather than push_back every element in it, as long as Combo_outputvector (and therefore linevector) contains the same type than AIVector and TRVector.
std::copy is used instead of a for loop. Apart from being slightly shorter, it is also more generic, meaning you could change your container type to anything that supports random access iterators and inserting at the back, and the copy would still work.
linevector is moved into Combo_outputvector. This can be a huge performance optimization if your vectors are large!
It is possible that you used an non-encapsulated linevector because you wanted to keep a copy of the last inserted row outside of the loop. That would prohibit moving it, however. For this reason it is faster and more descriptive to do it as I showed above and then simply do the following after the loop.
auto linevector = Combo_outputvector.back();
Suppose Foo is any class.
Foo f[5];
std::vector<Foo*> v;
I can insert the elements into vector of pointers using a for loop statement:
for (size_t i = 0; i < 5; i++)
v.push_back(&f[i]);
Is it possible to insert them using std::vector::insert() function and why not? I have tried several times it failed something like this:
v.insert(v.end(), &f[0], &f[5]); // error
If you mean, with a single call to insert, then no - that can copy a range, performing type conversions if needed, but can't apply arbitrary transformations like taking the address of each element.
You could use std::transform:
std::transform(std::begin(f), std::end(f),
std::back_inserter(v),
[](Foo & f) {return &f;});
although that's probably less clear than a simple loop, especially if you use new-style syntax
for (Foo & foo : f) {
v.push_back(&foo);
}
Yes you can use insert also. But there are few differences between these two operations:-
push_back puts a new element at the end of the vector and insert allows you to select position. This impacts the performance. insert forces to move all elements after the selected position of a new element. You simply have to make a place for it. This is why insert might often be less efficient than push_back.
Say I have something like this
vector<foo*> f;
Now suppose I have a method like this
void RemoveFromFoo(foo* fptr)
{
//search vector and remove if present
}
will something like this work ?
f.erase(std::remove(f.begin(), f.end(), fptr ), f.end());
Yes. That's the erase-remove idiom for removing selected elements from a contatiner.
remove will move all the elements you want to keep (those which don't equal fptr) to the start of the sequence, and return an iterator to the first element after them (the first element you want to erase).
Then erase will erase the elements from there until the end from the container, leaving just the ones at the start which you want to keep.