BOOST_FOREACH variant that enables removing currently processed elemement from the container - c++

I was looking for BOOST_FOREACH that would be resistant to removing the currently processed element from the container, where removing element doesn't invalidate iterators (apart the one pointing to the element removed, thus the one that is the foreach holding).
Containers like linked list are typical example of that and as we use boost intrusive lists a lot, the for cycles based on the example are starting to be too frequent.
//NumberList is boost intrusive list of structures, containing number property
void removeEvenNumbers(NumberList& numbers)
{
NumberList::iterator next = numbers.begin();
for (NumberList::iterator i = numbers.begin(); i != numbers.end(); i = next)
{
++next;
if (i->number % 2 == 0)
i->unlink();
}
}
Edit: please note, that the example is solvable by remove_if but in the real scenarios, it is often not usable, or practical.
I'm looking for foreach variant that would allow me to write much more elegant source code.
void removeEvenNumbers(NumberList& numbers)
{
BOOST_FOREACH_RESISTANT(NumberList::value_type& item, numbers)
if (item.number % 2 == 0)
item.unlink();
}
What is the simpliest way to create this kind of macro from the existing components used to create the original FOREACH in boost?

Related

Constraining remove_if on only part of a C++ list

I have a C++11 list of complex elements that are defined by a structure node_info. A node_info element, in particular, contains a field time and is inserted into the list in an ordered fashion according to its time field value. That is, the list contains various node_info elements that are time ordered. I want to remove from this list all the nodes that verify some specific condition specified by coincidence_detect, which I am currently implementing as a predicate for a remove_if operation.
Since my list can be very large (order of 100k -- 10M elements), and for the way I am building my list this coincidence_detect condition is only verified by few (thousands) elements closer to the "lower" end of the list -- that is the one that contains elements whose time value is less than some t_xv, I thought that to improve speed of my code I don't need to run remove_if through the whole list, but just restrict it to all those elements in the list whose time < t_xv.
remove_if() though does not seem however to allow the user to control up to which point I can iterate through the list.
My current code.
The list elements:
struct node_info {
char *type = "x";
int ID = -1;
double time = 0.0;
bool spk = true;
};
The predicate/condition for remove_if:
// Remove all events occurring at t_event
class coincident_events {
double t_event; // Event time
bool spk; // Spike condition
public:
coincident_events(double time,bool spk_) : t_event(time), spk(spk_){}
bool operator()(node_info node_event){
return ((node_event.time==t_event)&&(node_event.spk==spk)&&(strcmp(node_event.type,"x")!=0));
}
};
The actual removing from the list:
void remove_from_list(double t_event, bool spk_){
// Remove all events occurring at t_event
coincident_events coincidence(t_event,spk_);
event_heap.remove_if(coincidence);
}
Pseudo main:
int main(){
// My list
std::list<node_info> event_heap;
...
// Populate list with elements with random time values, yet ordered in ascending order
...
remove_from_list(0.5, true);
return 1;
}
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for cycle as suggested for example in this post?
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for loop?
Yes and yes. Don't fight to use code that is preventing you from reaching your goals. Keep it simple. Loops are nothing to be ashamed of in C++.
First thing, comparing double exactly is not a good idea as you are subject to floating point errors.
You could always search the point up to where you want to do a search using lower_bound (I assume you list is properly sorted).
The you could use free function algorithm std::remove_if followed by std::erase to remove items between the iterator returned by remove_if and the one returned by lower_bound.
However, doing that you would do multiple passes in the data and you would move nodes so it would affect performance.
See also: https://en.cppreference.com/w/cpp/algorithm/remove
So in the end, it is probably preferable to do you own loop on the whole container and for each each check if it need to be removed. If not, then check if you should break out of the loop.
for (auto it = event_heap.begin(); it != event_heap.end(); )
{
if (coincidence(*it))
{
auto itErase = it;
++it;
event_heap.erase(itErase)
}
else if (it->time < t_xv)
{
++it;
}
else
{
break;
}
}
As you can see, code can easily become quite long for something that should be simple. Thus, if you need to do that kind of algorithm often, consider writing you own generic algorithm.
Also, in practice you might not need to do a complete search for the end using the first solution if you process you data in increasing time order.
Finally, you might consider using an std::set instead. It could lead to simpler and more optimized code.
Thanks. I used your comments and came up with this solution, which seemingly increases speed by a factor of 5-to-10.
void remove_from_list(double t_event,bool spk_){
coincident_events coincidence(t_event,spk_);
for(auto it=event_heap.begin();it!=event_heap.end();){
if(t_event>=it->time){
if(coincidence(*it)) {
it = event_heap.erase(it);
}
else
++it;
}
else
break;
}
}
The idea to make erase return it (as already ++it) was suggested by this other post. Note that in this implementation I am actually erasing all list elements up to t_event value (meaning, I pass whatever I want for t_xv).

List erase iterator out of range

I have the following code:
static std::map<int,int> myFunction(std::list<int>& symbols){
std::map<int,int> currCounts;
std::map<int,int> payHits;
for (std::list<int>::iterator l_itr = symbols.begin(); l_itr != symbols.end(); ++l_itr){
myFunction_helper(*l_itr, l_itr, symbols, currCounts, payHits, 0);
}
return payHits;
}
static inline void myFunction_helper(int next, std::list<int>::iterator& pos, std::list<int> remainingSymbols, std::map<int,int> currCounts, std::map<int,int>& payHits, int i){
currCounts[next] = currCounts.count(next) > 0 ? currCounts[next] + 1 : 1;
remainingSymbols.erase(pos);
if (i < numTiles && remainingSymbols.size() > 0){
if (currCounts[next] == hitsNeeded[next]){
int pay = symbolPays[next];
payHits[pay] = payHits.count(pay) > 0 ? payHits[next] + 1 : 1;
}
else{
for (std::list<int>::iterator l_itr = remainingSymbols.begin(); l_itr != remainingSymbols.end(); ++l_itr){
myFunction_helper(*l_itr, l_itr, remainingSymbols, currCounts, payHits, i+1);
}
}
}
else{
payHits[0] = payHits.count(0) > 0 ? payHits[0] + 1 : 1;
}
}
It is supposed to take a set of values and given some requirements (numTiles (int), hitsNeeded (a map of symbols and number of times they need to be chosen to win)). My code builds on visual studios (most recent version), but when I try executing it I get the error "list erase iterator out of range" the first time the myFunction_helper is called. How do I avoid this? I purposefully passed the remainingSymbols by value so that I can modify it without affecting other recursive stack frame members. How do I fix this and whyis this raising an exception?
Solution
Remove the iterator from the arguments. Then as you iterate you use the following snippet of code:
int next = *l_itr;
l_itr = symbols.erase(l_itr);
myFunction_helper(next, remainingSymbols, currCounts, payHits, i+1);
symbols.push_front(next);
And similarly for the outer function. Pushing the element to the front doesn't disrupt the iteration over the list and allows for what I want (pushing to the front is super cheap on lists too).
Agree with the Comments below. This is a crap answer because we don't know enough about the business case to suggest a good solution. I'm leaving an edited version it here because I just reverted the vandalized question and it does explain why the attempt failed.
Why This is raising an exception
std::list<int> remainingSymbols is pass by value, so pos is no longer relevant. It refers to the source list, not the copy of the source list in remainingSymbols. Using an iterator for one list in another, even a copy, is fatal.
solution
The common solution is to solution is to pass remainingSymbols by reference: std::list<int> & remainingSymbols, but since this will break backtracking, you can't do that.
Instead you will have to pass a different identifier for the position, perhaps the index. Unfortunately interating and re-iterating a list is an expensive task that almost always outweighs the quick insert and delete benefits of a list.
You cannot use iterator from one container with another one, you could use offset instead, but that would be very ineffective with std::list. Additionally usingstd::list with int is not a good idea in general - your data is small and most probably you use at least twice more memory for maintaining list items than data itself plus cache misses. You better use std::vector<int> and pass offset, not iterator. Additionaly with vector<> you can use move erase idiom but even deleting int in middle of vector is relatively cheap, most probably less expensive than cost of jumping of std::list nodes.

Efficient intersection of two sets

I have two sets (or maps) and need to efficiently handle their intersection.
I know that there are two ways of doing this:
iterate over both maps as in std::set_intersection: O(n1+n2)
iterating over one map and finding elements in the other: O(n1*log(n2))
Depending on the sizes either of these two solution is significantly better (have timed it), and I thus need to either switch between these algorithm based on the sizes (which is a bit messy) - or find a solution outperforming both, e.g. using some variant of map.find() taking the previous iterator as a hint (similarly as map.emplace_hint(...)) - but I could not find such a function.
Question: Is it possible to combine the performance characteristics of the two solutions directly using STL - or some compatible library?
Note that the performance requirement makes this different from earlier questions such as
Efficient intersection of sets?
In almost every case std::set_intersection will be the best choice.
The other solution may be better only if the sets contain a very small number of elements.
Due to the nature of the log with base two.
Which scales as:
n = 2, log(n)= 1
n = 4, log(n)= 2
n = 8, log(n)= 3
.....
n = 1024 log(n) = 10
O(n1*log(n2) is significantly more complex than O(n1 + n2) if the length of the sets is more than 5-10 elements.
There is a reason such function is added to the STL and it is implemented like that. It will also make the code more readable.
Selection sort is faster than merge or quick sort for collections with length less than 20 but is rarely used.
For sets that are implemented as binary trees, there actually is an algorithm that combines the benefits of both the procedures you mention. Essentially, you do a merge like std::set_intersection, but while iterating in one tree, you skip any branches that are all less than the current value in the other.
The resulting intersection takes O(min(n1 log n2, n2 log n1, n1 + n2), which is just what you want.
Unfortunately, I'm pretty sure std::set doesn't provide interfaces that could support this operation.
I've done it a few times in the past though, when working on joining inverted indexes and similar things. Usually I make iterators with a skipTo(x) operation that will advance to the next element >= x. To meet my promised complexity it has to be able to skip N elements in log(N) amortized time. Then an intersection looks like this:
void get_intersection(vector<T> *dest, const set<T> set1, const set<T> set2)
{
auto end1 = set1.end();
auto end2 = set2.end();
auto it1 = set1.begin();
if (it1 == end1)
return;
auto it2 = set2.begin();
if (it2 == end2)
return;
for (;;)
{
it1.skipTo(*it2);
if (it1 == end1)
break;
if (*it1 == *it2)
{
dest->push_back(*it1);
++it1;
}
it2.skipTo(*it1);
if (it2 == end2)
break;
if (*it2 == *it1)
{
dest->push_back(*it2);
++it2;
}
}
}
It easily extends to an arbitrary number of sets using a vector of iterators, and pretty much any ordered collection can be extended to provide the iterators required -- sorted arrays, binary trees, b-trees, skip lists, etc.
I don't know how to do this using the standard library, but if you wrote your own balanced binary search tree, here is how to implement a limited "find with hint". (Depending on your other requirements, a BST reimplementation could also leave out the parent pointers, which could be a performance win over the STL.)
Assume that the hint value is less than the value to be found and that we know the stack of ancestors of the hint node to whose left sub-tree the hint node belongs. First search normally in the right sub-tree of the hint node, pushing nodes onto the stack as warranted (to prepare the hint for next time). If this doesn't work, then while the stack's top node has a value that is less than the query value, pop the stack. Search from the last node popped (if any), pushing as warranted.
I claim that, when using this mechanism to search successively for values in ascending order, (1) each tree edge is traversed at most once, and (2) each find traverses the edges of at most two descending paths. Given 2*n1 descending paths in a binary tree with n2 nodes, the cost of the edges is O(n1 log n2). It's also O(n2), because each edge is traversed once.
With regard to the performance requirement, O(n1 + n2) is in most circumstances a very good complexity so only worth considering if you're doing this calc in a tight loop.
If you really do need it, the combination approach isn't too bad, perhaps something like?
Pseudocode:
x' = set_with_min_length([x, y])
y' = set_with_max_length([x, y])
if (x'.length * log(y'.length)) <= (x'.length + y'.length):
return iterate_over_map_find_elements_in_other(y', x')
return std::set_intersection(x, y)
I don't think you'll find an algorithm that will beat either of these complexities but happy to be proven wrong.

C++ STL algorithms to add element in list

I want to know if anyone has a quick way for adding an element to a std::list<T*> if the element is not already in it.
It's a generic function and I can not use loops so something like this
template <class T>
bool Class<T>::addElement(const T* element)
{
for (list<T*>::iterator it = list_.begin(); it != list_.end(); it++)
{
if (element == *it)
return false;
}
list_.push_back(element);
return true;
}
Is not ok because of the loop. Does anyone have ideas?
Why is what you have "not ok"? Looks perfectly fine and readable to me (modulo missing typename).
If you really don't want to use a loop, you can accomplish the same by using the algorithm to does precisely that loop: std::find:
template <class T>
bool Class<T>::addElement(const T* element)
{
if (std::find(list_.begin(), list_.end(), element) != list_.end()) {
return false;
}
list_.push_back(element);
return true;
}
If you can add other members to your class, you could add an index such as a std::unordered_set. That container stores a list of unique values, and can be searched for specific values in O(1) complexity, which implies that no full-loop search is done by the implementation for checking if the value already exists. It will stay fast even if you have a lot of values already stored.
With std::list, using library functions such as std::find will avoid explicitely writing a loop, but the implementation will perform the loop and this will be slow when a lot of values are already stored (O(n) complexity)
You can use intrusive list instead of the std::list. In this case each element in the list keeps its node data, so you can just query that data to find out if the element is already in the list. The disadvantage is that all elements in this list must be able to provide such data, and you can't put in such lists, for example, integer or boolean elements.
If you still need the std::list and/or the elements can be of any type, then the only way of fast queryng whether the element already exists in the list is to use an index. The indexes can be stored in separate std::unordered_set for fast lookups. You can use for indexes either the list's values "as is" or calculate the indexes using any custom function.

Z-sorted list of 3d objects

In a C++/OpenGL app, I have a bunch of translucent objects arranged in 3d space. Because of the translucency, the objects must be drawn in order from furthest to nearest. (For the reasons described in "Transparency Sorting.")
Luckily, the camera is fixed. So I plan to maintain a collection of pointers to the 3d objects, sorted by camera Z. Each frame, I'll iterate over the collection, drawing each object.
Fast insertion and deletion are important, because the objects in existence change frequently.
I'm considering using a std::list as the container. To insert, I'll use std::lower_bound to determine where the new object goes. Then I'll insert at the iterator returned by lower_bound.
Does this sound like a sane approach? Given the details I've provided, do you foresee any major performance issues I've overlooked?
I don't think a std::list would ever be a good choice for this use case. While insertion is very inefficient, you need to iterate through the list to find the right place for the insertion, which makes it O(n) complexity.
If you want to keep it simple, a std::set would already be much better, and even simpler to apply than std::list. It's implemented as a balanced tree, so insertion is O(log n) complexity, and done by simply calling the insert() method on the container. The iterator gives you the elements in sorted order. It does have the downside of non-local memory access patterns during iteration, which makes it not cache friendly.
Another approach comes to mind that intuitively should be very efficient. Its basic idea is similar to what #ratchet_freak already proposed, but it does not copy the entire vector on each iteration:
The container that contains the main part of the data is a std::vector, which is always kept sorted.
New elements are added to an "overflow" container, which could be a std::set, or another std::vector that is kept sorted. This is only allowed to reach a certain size.
While iterating, traverse the main and overflow containers simultaneously, using similar logic to a merge sort.
When the overflow container reaches the size limit, merge it with the main container, resulting in a new main container.
A rough sketch of the code for this:
const size_t OVERFLOW_SIZE = 32;
// Ping pong between two vectors when merging.
std::vector<Entry> mainVecs[2];
unsigned activeIdx = 0;
std::vector<Entry> overflowVec;
overflowVec.reserve(OVERFLOW_SIZE);
void insert(const Entry& entry) {
std::vector<Entry>::iterator pos =
std::upper_bound(overflowVec.begin(), overflowVec.end(), entry);
overflowVec.insert(pos, 1, entry);
if (overflowVec.size() == OVERFLOW_SIZE) {
std::merge(mainVecs[activeIdx].begin(), mainVecs[activeIdx].end(),
overflowVec.begin(), overflowVec.end(),
mainVecs[1 - activeIdx].begin());
mainVecs[activeIdx].clear();
overflowVec.clear();
activeIdx = 1 - activeIdx;
}
}
void draw() {
std::vector<Entry>::const_iterator mainIt = mainVecs[activeIdx].begin();
std::vector<Entry>::const_iterator mainEndIt = mainVecs[activeIdx].begin();
std::vector<Entry>::const_iterator overflowIt = overflowVec.begin();
std::vector<Entry>::const_iterator overflowEndIt = overflowVec.end();
for (;;) {
if (overflowIt == overflowEndIt) {
if (mainIt == mainEndIt) {
break;
}
draw(*mainIt);
++mainIt;
} else if (mainIt == mainEndIt) {
if (overflowIt == overflowEndIt) {
break;
}
draw(*overflowIt);
++overflowIt;
} else if (*mainIt < *overflowIt) {
draw(*mainIt);
++mainIt;
} else {
draw(*overflowIt);
++overflowIt;
}
}
}
std::list is a non-random-access container,
Complexity of lower_bound.
On average, logarithmic in the distance between first and last: Performs approximately log2(N)+1 element comparisons (where N is this distance).
On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average
So it seems not a good idea.
Using std::vector, you will have correct complexity for lower_bound.
And you may have better performance too for inserting/removing element(but lower complexity).
Depending on how big the list is you can keep a smaller "mutation set" for the objects that got added/changed the last frame and a big existing sorted set.
Then each frame you do a merge while drawing:
vector<GameObject*> newList;
newList.reserve(mutationSet.size()+ExistingSet.size();
sort(mutationSet.begin(), mutationSet.end(), byZCoord);//small list -> faster sort
auto mutationIt = mutationSet.begin();
for(auto it = ExistingSet.begin(); it != ExistingSet.end(); ++it){
if(*it->isRemoved()){
//release to pool and
continue;
}
while(mutationIt != mutationSet.end() && *mutationIt->getZ() < *it->getZ()){
*mutationIt->render();
newList.pushBack(*mutationIt);
}
*it->render();
newList.pushBack(*iIt);
}
while(mutationIt != mutationSet.end()){
*mutationIt->render();
newList.pushBack(*mutationIt);
}
mutationSet.clear();
ExistingSet.clear();
swap(ExistingSet, newList);
You will be doing the iteration anyway and sorting a small list is faster than appending the new list and sorting everything O(n + k + k log k) vs. O( (n+k)log(n+k))