C++ map.erase() causes unwanted balancing

C++ map.erase() causes unwanted balancing - c++

I have a map<uint, Order*> orders where Order is a defined class with applicable fields such as id, time, price and volume. I also have a thread which listens for incoming order additions and deletions for the map defined below.
void System::OrderDeleted_Thread(unsigned int order_id)
{
if(orders.find(order_id) != orders.end())
{
Order* order = orders[order_id];
orders.erase(order_id);
delete order;
}
}
My problem is very similar to this one:
Segmentation fault in std function std::_Rb_tree_rebalance_for_erase ()
My question is, how can I iterate through my orders map without the program giving me an error when it comes time to re balance the tree? Just like the solution in the link says, I have taken out the .erase(uint) method and gotten it to work. Unfortunately, I cannot keep a map of several tens of thousands keys around.
Thanks in advance!

I also have a thread which listens for incoming order additions and deletions for the map defined below.
You need to synchronize access to the map. STL containers are not thread-safe with multiple writers (and erasing elements is writing to the container) without some sort of external synchronization.

Queue up your additions and deletions in a seperate data structure, and then process them at a safe time, that is when you are guaranteed to not be iterating through the map. That safe time can be after you have acquired a mutex which protects the map, or some other way, depending on your program.

Apart from synchronisation issues, that's a costly way to write the loop. Instead, try this:
std::map<uint, Order*>::iterator it;
Order * p = NULL;
{ // enter critical section
if ((it = orders.find(id)) != orders.end())
{
p = it->second;
orders.erase(it);
}
} // leave critical section
delete p;

Related

Counter and ID attribute of class stored in vector

I have a dilemma regarding my removing database function code.
Whenever I remove the database in vector with unique, I cannot think of writing the code that could fill the gap with the removed number (like I removed database ID3 and I want that the IDs of further databases would increment to have a stable sequence, so the database with ID4 would become ID3).
I also don't know how to decrement my static int counter.
File:
**void Database::Rem()
{
int dddddet;
cin >> dddddet;
if (iter != DbMain.end())
{
DbMain.erase(iter);
}
}**
std::istream &operator>>(std::istream &re, Base &product)
{
}
}
std::ostream &printnames(std::ostream &pr, Base &pro)
{
pr << "\nID:" << pro.ID << "\nName:" << pro.name;
return pr;
}
Header file:
"

The thing you're doing here is called a design anti-pattern: Some structural idea that you could easily come up with in a lot of situations, but that's going to be a lot of trouble (i.e., bad). It's called a singleton: You're assuming there's only ever going to be one DbMain, so you store the length of that in a "global-alike" static member. Makes no sense! Simply use the length of DbMain. You should never need a global or a static member to count objects that you store centrally, anyways.
You never actually need the ID to be part of the object you store – its whole purpose is being the index within the dBMain storage. A vector is already ordered! So, instead of printing your .ID when you iterate through the vector, simply print the position in the vector. Instead of "find the base with ID == N and erase" you could do "erase the (DbMain.begin() + N) element" and be done with it.

The problem in your design is that you seem somehow to associate the unique ID with the index in the vector, as you initialize the IDs with a static counter. This is not a good idea, since:
you would need to renumber a lot of IDs at each delete and this would make it very difficult to maintain cross-references.
you could have several databases each with its own set of ids.
there is a risk that you'd not get the counter to a proper value, if you add bases without reading them.
Moreover, iterating sequentially looking for an ID is not very efficient. A more efficient approach would be to store your bases in a std::map: this allows to find IDs very efficiently, indexing by ID instead of sequential number.
Your only concern would then be to ensure uniqueness of IDs. Of course, your counter approach could work, if you make sure that it is updated whenever a new base is created, and that the state is persisted with the database in the text file in which you'll save all this. You just have to make clear that the IDs are not guaranteed to be sequential. A pragmatic way to ensure contribute to this understanding is to issue an error message if there is an attempt to delete a record that was not found.

Two priority queues for same pointers in c++

I have class:
class A{
//fields, methods
;
I need an efficient data structure that allows you to choose from a variety of pointers to objects of class A minima and maxima (it should work online, that is the choice of questions will alternate with requests for adding new poiters). This can be done by using two priority queues:
priority_queue<A*, vector<A*>, ComparatorForFindingLightestObjects>* qL;
priority_queue<A*, vector<A*>, ComparatorForFindingHardestObjects>* qH;
The problem is that if the object pointer is extracted from the first queue, then after a while the object is destroyed, but since a pointer to the object is still present in another queue there happens errors of reading data from the freed memory.
How solve this problem by means of the standard STL containers without writing own data structures?

I believe you're looking for boost::multi_index which is a single container accessible but multiple different "views": http://www.boost.org/doc/libs/1_59_0/libs/multi_index/doc/index.html

I think you can use std::set and delete the entry from the second set as soon as you extract the data from the first. Performance wise, both give O(log(n)) lookup and insertion. I'm not sure if this is what you want but i'll try
//Use std::set as your priority queue instead
set<A*, ComparatorForFindingLightestObjects> qL;
set<A*, ComparatorForFindingHardestObjects> qH;
auto it=qL.begin(); //The first element
if(it!=aL.end())
{
A* curr=*it;
qL.erase(curr); //Delete it from this
qH.erase(curr); //Delete this from the other queue as well
}
Also, I think you can merge your two queues or whatever and just maintain one container. You can access the minimum and maximum elements by *containerName.begin() and *containerName.rbegin() respectively

maps holding queues: using [] vs .insert

I am using a map<int, queue<string>>, where int refers to the source of a message, and the queue holds the message. One thread pushes messages into the queue, another thread pushes them out of the queue.
This is a client-server program - when the client sends a message, the message gets pushed into the queue.
I am currently using (pseudo code)
/*receive message in thread 1*/
map<int, queue<string>> test_map;
int client_id = 2;
string msg = received_from_client(client_id);
testmap[client_id].push(msg);
/*process message in thread 2*/
string msg_to_process testmap[client_id].front();
test_map[client_id].pop();
if (testmap[client_id].empty())
{
testmap.erase(client_id);
}
I know from this question that the difference is that insert will not overwrite an existing key - does this apply when I am pushing things into queues? Is it safer to use insert, or is what I'm doing with [] sufficient?
Also - while the system should only have one message in the queue at any one time, I am making expansion allowances by using map<int, queue> instead of using map<int,string>.
edit: I have a question about multiple threading as well - what happens when thread 1 attempts to insert into the map while thread 2 deletes the key because the queue is empty (after it has processed the message). Is that a quantitative answer to this, and does using [] or insert() help make it anymore threadsafe?

Queue's don't have keys or [] operators, so your first question can't really be answered. You insert into queue's by pushing onto the back. If there are elements there, it will go after them. You read off a queue by popping things off of the front, if there are any. You don't read or write anywhere other than that.
As for maps, like you said, insert will add a new key-value pair if it does not exist already. It will not overwrite an existing key. Find will find a value if it exists already, but will not insert it if it doesn't. And then the [] operator does both, and also allows you to change existing elements. The documentation here is very good.
One thing to be aware of is that using the map's [] operator to read from the map will also insert a default valuetype element into the map with that key, and is probably not what you would expect when first looking at it.
std::map<int, int> myMap;
if(myMap[1] == 0) //[] create's a key-value pair <1,0>
cout << "This will output";
if(myMap.size() == 1)
cout << "This too";
As for the thread safety aspect, no STL containers are thread safe based on the standard. You need to add proper locking in your code to prevent exactly what you asked about. If 2 threads tried to read and write from a queue at the same time, it will almost definitely cause an error. I would google around about writing thread safe programs for general help on how to do that.

Fastest way to count/ access DOMNode children using Xerces C++

I'm trying to figure out the fastest way to count the number of child elements of a Xerces C++ DOMNode object, as I'm trying to optimise the performance of a Windows application which uses the Xerces 2.6 DOMParser.
It seems most of the time is spent counting and accessing children. Our application needs to iterate every single node in the document to attach data to it using DOMNode::setUserData() and we were initially using DOMNode::getChildNodes(), DOMNodeList::getLength() and DOMNodeList::item(int index) to count and access children, but these are comparatively expensive operations.
A large performance improvement was observed when we used a different idiom of calling
DOMNode:: getFirstChild() to get the first child node and invoke DOMNode::getNextSibling() to either access a child at a specific index or count the number of siblings of the first child element to get a total child node count.
However, getNextSibling() remains a bottleneck in our parsing step, so I'm wondering is there an even faster way to traverse and access child elements using Xerces.

Yes soon after I posted, I added code to store and manage the child count for each node, and this has made a big difference. The same nodes were being visited repeatedly and the child count was being recalculated every time. This is quite an expensive operation as Xerces essentially rebuilds the DOM structure for that node to guarantee its liveness. We have our own object which encapsulates a Xerces DOMNode along with extra info that we need , and we use DOMNode::setUserData to associate our object with the relevant DOMnode, and that now seems to be the last remaining bottleneck.

The problem with DOMNodeList is, that it is really a quite simple list, thus such operations like length and item(i) have costs of O(n) as can be seen in code, for example here for length:
XMLSize_t DOMNodeListImpl::getLength() const{
XMLSize_t count = 0;
if (fNode) {
DOMNode *node = fNode->fFirstChild;
while(node != 0){
++count;
node = castToChildImpl(node)->nextSibling;
}
}
return count;
}
Thus, DOMNodeList should not be used if one doesn't expect that the DOM-tree will be changed while iterating, because accessing an item in O(n) thus making iteration a O(n^2) operation - a disaster waiting to happen (i.e. a xml-file big enough).
Using [DOMNode::getFistChild()][2] and DOMNode::getNextSibling() is a good enough solution for an iteration:
DOMNode *child = docNode->getFirstChild();
while (child != nullptr) {
// do something with the node
...
child = child->getNextSibling();
}
Which happens as expected in O(n^2).
One also could use [DOMNodeIterator][3] , but in order to create it the right DOMDocument is needed, which is not always at hand when an iteration is needed.

STL sorted set where the conditions of order may change

I have a C++ STL set with a custom ordering defined.
The idea was that when items get added to the set, they're naturally ordered as I want them.
However, what I've just realised is that the ordering predicate can change as time goes by.
Presumably, the items in the set will then no longer be in order.
So two questions really:
Is it harmful that the items would then be out of order? Am I right in saying that the worst that can happen is that new entries may get put into the wrong place (which actually I can live with). Or, could this cause crashes, lost entries etc?
Is there a way to "refresh" the ordering of the set? You can't seem to use std::sort() on a set. The best I can come up with is dumping out the contents to a temp container and re-add them.
Any ideas?
Thanks,
John

set uses the ordering to lookup items. If you would insert N items according to ordering1 and insert an item according to ordering2, the set cannot find out if the item is already in.
It will violate the class invariant that every item is in there only once.
So it does harm.

The only safe way to do this with the STL is to create a new set with the changed predicate. For example you could do something like this when you needed to sort the set with a new predicate:
std::set<int> newset( oldset.begin(), oldset.end(), NewPred() );

This is actually implementation dependent.
The STL implementation can and usually will assumes the predicate used for sorting is stable (otherwise, "sorted" would not be defined). It is at least possible to construct a valid STL implementation that formats your hard drive when you change the behavior of the predicate instance.
So, yes, you need to re-insert the items into a new set.
Alternatively, you could construct your own container, e.g. a vector + sort + lower_bound for binary search. Then you could re-sort when the predicates behavior changes.

I agree with the other answers, that this is going to break in some strange and hard to debug ways. If you go the refresh route, you only need to do the copy once. Create a tmp set with the new sorting strategy, add each element from the original set to the tmp set, then do
orig.swap(tmp);
This will swap the internals of the sets.
If this were me, I would wrap this up in a new class that handles all of the details, so that you can change implementations as needed. Depending on your access patterns and the number of times the sort order changes, the previously mentioned vector, sort, lowerbound solution may be preferable.

If you can live with an unordered set, then why are you adding them into a set in the first place?
The only case I can think of is where you just want to make sure the list is unique when you add them. If that's the case then you could use a temporary set to protect additions:
if (ts.insert (value).second) {
// insertion took place
realContainer.push_back (value);
}
An alternative, is that depending on how frequently you'll be modifying the entries in the set, you can probably test to see if the entry will be in a different location (by using the set compare functionality) and where the position will move then remove the old entry and re-add the new one.
As everyone else has pointed out - having the set unordered really smells bad - and I would also guess that its possible got undefined behaviour according to the std.

While this doesn't give you exactly what you want, boost::multi_index gives you similar functionality. Due to the way templates work, you will never be able to "change" the ordering predicate for a container, it is set in stone at compile time, unless you are using a sorted vector or something similar, to where you are the one maintaining the invariant, and you can sort it however you want at any given time.
Multi_index however gives you a way to order a set of elements based on multiple ordering predicates at the same time. You can then select views of the container that behave like an std::set ordered by the predicate that you care about at the time.

This can cause lost entries, when searching for an element in a set the ordering operator is used this means that if an element was placed to the left of the root and now the ordering operator says it's to the right then that element will not longer be found.

Here's a simple test for you:
struct comparer : public std::binary_function<int, int, bool>
{
static enum CompareType {CT_LESS, CT_GREATER} CompareMode;
bool operator()(int lhs, int rhs) const
{
if(CompareMode == CT_LESS)
{
return lhs < rhs;
}
else
{
return lhs > rhs;
}
}
};
comparer::CompareType comparer::CompareMode = comparer::CT_LESS;
typedef std::set<int, comparer> is_compare_t;
void check(const is_compare_t &is, int v)
{
is_compare_t::const_iterator it = is.find(v);
if(it != is.end())
{
std::cout << "HAS " << v << std::endl;
}
else
{
std::cout << "ERROR NO " << v << std::endl;
}
}
int main()
{
is_compare_t is;
is.insert(20);
is.insert(5);
check(is, 5);
comparer::CompareMode = comparer::CT_GREATER;
check(is, 5);
is.insert(27);
check(is, 27);
comparer::CompareMode = comparer::CT_LESS;
check(is, 5);
check(is, 27);
return 0;
}
So, basically if you intend to be able to find the elements you once inserted you should not change the predicate used for insertions and find.

Just a follow up:
While running this code the Visual Studio C debug libraries started throwing exceptions complaining that the "<" operator was invalid.
So, it does seem that changing the sort ordering is a bad thing. Thanks everyone!

1) Harmful - no. Result in crashes - no. The worst is indeed a non-sorted set.
2) "Refreshing" would be the same as re-adding anyway!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ map.erase() causes unwanted balancing - c++

Queue up your additions and deletions in a seperate data structure, and then process them at a safe time, that is when you are guaranteed to not be iterating through the map. That safe time can be after you have acquired a mutex which protects the map, or some other way, depending on your program.

Apart from synchronisation issues, that's a costly way to write the loop. Instead, try this: std::map<uint, Order>::iterator it; Order p = NULL; { // enter critical section if ((it = orders.find(id)) != orders.end()) { p = it->second; orders.erase(it); } } // leave critical section delete p;

Related

Counter and ID attribute of class stored in vector

Two priority queues for same pointers in c++

maps holding queues: using [] vs .insert

Fastest way to count/ access DOMNode children using Xerces C++

STL sorted set where the conditions of order may change

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ map.erase() causes unwanted balancing - c++

Queue up your additions and deletions in a seperate data structure, and then process them at a safe time, that is when you are guaranteed to not be iterating through the map. That safe time can be after you have acquired a mutex which protects the map, or some other way, depending on your program.

Apart from synchronisation issues, that's a costly way to write the loop. Instead, try this: std::map<uint, Order*>::iterator it; Order * p = NULL; { // enter critical section if ((it = orders.find(id)) != orders.end()) { p = it->second; orders.erase(it); } } // leave critical section delete p;

Related

Counter and ID attribute of class stored in vector

Two priority queues for same pointers in c++

maps holding queues: using [] vs .insert

Fastest way to count/ access DOMNode children using Xerces C++

STL sorted set where the conditions of order may change

Categories

Resources

Apart from synchronisation issues, that's a costly way to write the loop. Instead, try this: std::map<uint, Order>::iterator it; Order p = NULL; { // enter critical section if ((it = orders.find(id)) != orders.end()) { p = it->second; orders.erase(it); } } // leave critical section delete p;