Does it risk if use c++ std::unordered_map like this? - c++

Does it risk if use c++ std::unordered_map like this?
std::unordered_map<std::string, std::unordered_set<std::shared_ptr<SomeType>>> _some_map;
...
void init() {
auto item = std::make_shared<SomeType>();
_some_map["circle"].insert(item);
}
_some_map is a member variable. init() function can be called only once, so insert is thread-safe. After initialization, the map will be read only in multi-thread. I think it is thread-safe too.
I'm not sure if I can use insert like this. Because it will make a unordered_set and a pair when there is no val of key "circle". The code works normally. Just want to make sure it is without potential risk.
(I am not good at English writing. )
Thank you.

I'm not sure if I can use insert like this.
Yes, you can, because operator[]
Returns a reference to the value that is mapped to a key equivalent to key, performing an insertion if such key does not already exist.
Your value_type is std::unordered_set which is default constructible, so there is no problem.
After initialization, the map will be read only in multi-thread
Here also you are safe, because according to the containers documentation
All const member functions can be called concurrently by different threads on the same container.
so if you are only reading the values without modifying them, it is OK. You could even perform write operations if you could guarantee that different threads are accessing different elements in your container.

Related

Access to custom objects in unordered_set

Please help to figure out the logic of using unordered_set with custom structures.
Consider I have following class
struct MyClass {
int id;
// other members
};
used with shared_ptr
using CPtr = std::shared_ptr<MyClass>;
Because of fast access by key I supposed to use an unordered_set with a custom hash and the MyClass::id member as a key):
template <class T> struct CHash;
template<> struct CHash<CPtr>
{
std::size_t operator() (const CPtr& c) const
{
return std::hash<decltype(c->id)> {} (c->id);
}
};
using std::unordered_set<CPtr, CHash>;
Right now, unordered_set still seems to be an appropriate container. However standard find() functions for sets are assumed to be const to ensure keys won't be changed. I intend to change objects guaranteeing keeping keys unchanged. So, the questions are:
1) How to realize easy accessing to element of set by int key reserving possibility to change element, something like
auto element = my_set.find(5);
element->b = 3.3;
It is possible to add converting constructor and use something like
auto element = my_set.find(MyClass (5));
But it doesn't solve the problem with constness and what if the class is huge.
2) Am I actually going wrong way? Should I use another container? For example unordered_map, that will store one more int key for each entry consuming more memory.
A pointer doesn't project its constness to the object it points to. Meaning, if you have a constant reference to a std::shared_ptr (as in a set) you can still modify the object via this pointer. Whether or not that is something you should do a is a different question and it doesn't solve your lookup problem.
OF course, if you want to lookup a value by a key, then this is what std::unordered_map was designed for so I'd have a closer look there. The main problem I see with this approach is not so much the memory overhead (unordered_set and unordered_map as well as shared_ptr have noticeable memory overhead anyway), but that you have to maintain redundant information (id in the object and id as a key).
If you have not many insertions and you don't absolutely need the (on average) constant lookup time and memory overhead is really important to you, you could consider a third solution (besides using a third-party or self written data structure of courses): namely to write a thin wrapper around a sorted std::vector<std::shared_ptr<MyClass>> or - if appropriate - even better std::vector<std::unique_ptr<MyClass>> that uses std::upper_bound for lookups.
I think you are going a wrong way using unordered_set,because unordered_set's definition is very clear that:
Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - they can be inserted and removed, though.
You can see its definition in site:
http://www.cplusplus.com/reference/unordered_set/unordered_set/.
And hope it is helpful for you.Thanks.

Using an unordered_set with shared_ptr keys

I am trying to use the following data collection in my program:
boost::unordered_set<boost::shared_ptr<Entity> > _entities;
I am using unordered_set because I want fast insertion and removal (by key, not iterator) of Entities.
My doubt is, if I implement the following two functions:
void addEntity(boost::shared_ptr<Entity> entity) {
_entities.insert(entity);
}
void removeEntity(boost::shared_ptr<Entity> entity) {
_entities.remove(entity);
}
When I try to remove the entity, will the unordered_set find it? Because the shared_ptr that is stored inside the unordered_set is a copy of the shared_ptr that I am trying to use to remove the entity from the unordered_set if I call removeEntity().
What do I need to do for the unordered_set to find the Entity? Do I need to create a comparison function that checks the values of the shared_ptr's? But then won't the unordered_set slow down because its hash function uses the shared_ptr as hash? Will I need to create a hash function that uses the Entity as hash also?
Yes you can use boost::shared_ptr in boost::unordered_set (same applies to std version of these classes)
boost::unordered_set uses boost::hash template function to generate keys for boost::unordered_set. This function is specialised for boost::shared_ptr to take underlying pointer into account.
Let me try to explain this if I got it right :-
boost::shared_ptr is implemented using reference counting mechanism. That means whenever you are passing it to some other let's say function you are just increasing reference count whereas when you delete it you are decreasing its reference count. When reference count is 0 then only that object is deleted from memory.
Always be cautious when using it. It can save you from memory leaks only as far as your design is good to accomodate them.
For example I faced one issue.
I'd a class with map containing shared_ptrs. Later this class ( without my knowledge )was also made responsible for passing these shared-ptrs to some other classes which in-turn used some containers to store these. As a result this code blowed up luckily on tester's face in the form of memory leak.
I hope you can figure out why.

is `std::map<..> a; blah = a[abcd];` thread safe if abcd was not created before this call?

So we created a map. We want to get some_type blah = map_variable[some_not_inserted_yet_value] this would call add new item to map if one was not previosly created. So I wonder if read is really thread safe with std::map or it is only possible to thread safly try{ ...find(..)->second...?
The idea that calling find(...)->second is thread-safe is very dependent of your view of thread-safety. If you simply mean that it won't crash, then as long as no one is mutating the dictionary at the same time you're reading it, I suppose you're okay.
That said, indeed, no matter what your minimum thread safety requirements are, calling the operator[] method is inherently not thread-safe as it can mutate the collection.
If a method has no const overload, it means it can mutate the object, so unless the documentation indicates methods are thread-safe, the method is very unlikely to be.
Then again, a const method might not be thread-safe as well, because your object could depend on non-const global state or have mutable fields, so you'll want to be very, very careful if you use unsynchronized classes as if they were.
If you're 100% sure that the map contains the key, then it is technically thread-safe if all other threads are also only invoking read-only methods on the map. Note however, that there is no const version of map<k,v>::operator[](const k&).
The correct way to access the map in a thread-safe fashion is indeed:
map<k,v>::const_iterator match = mymap.find(key);
if ( match != mymap.end() ) {
// found item.
}
As stated before, this only applies if all concurrent access is read-only. One way this can be guaranteed is to use a readers-writers lock.
Note that in C++03, there is no mention of threads in the standard, so even that is not guaranteed to be thread-safe. Make sure to check your implementation's documentation.
Standard library containers have no notion of thread safety. You have to synchronize concurrent read/write access to the container yourself.
try has nothing to do with multithreading. It's used for exception handling.
find does not throw an exception if the key is not found. If the key is not found, find returns the map's end() iterator.
You are correct, operator[] is not "thread safe" because it can mutate the container. You should use the find method and compare the result to std::map::end to see if it found the item. (Also notice that find has a const version while operator[] does not).
Like others have said, the version of C++ before C++11 has no notion of threads or thread safety. However, you can feel safe using find without synchronization because it doesn't change the container, so it's only doing read operations (unless you have a weird implementation, so make sure to check the docs). As with most containers, reading from it from different threads won't cause any harm, however writing to it might.

C++ associative containers - why doesn't the standard defines methods to exchange and replaces keys?

I need to replace specific key values, while the rest of the value_type is left untouched. What I actually need to do, is copy the value, erase the entry and insert it with changed key value again. This is absolutely bad. I need to copy the whole value_type twice, and deallocate/allocate again.
Why the standard doesn't define methods like this:
// returns count of exchanged keys
size_type exchange_key(key_type const& x, key_type const& y);
// returns count of replaced keys
size_type replace_key(key_type const& old_key, key_type const& new_key);
Is there anything I'm missing?
I don't why it was not added in the first place, and i understand that it is too bad. I guess they just added what they felt was absolutely necessary.
I think i have read somewhere that Boost.MultiIndex provided this ability.
Associative containers are implemented in a way that does not allow to change the 'key' in an efficient manner. To make this explicit it does not provide convienence methods to replace a key. The associative container would also have to remove and insert again under the covers.
I think this is an abstraction problem. The standard doesn't say exactly how the containers are to be implemented, it only specifies the maximum complexity of some of the operations and leaves the rest to the implementation.
If the standard were to add a replace_key function, it would also have to specify that this should have a different complexity than the existing erase-insert combination. How can it do that without leaking implementation details? If the function isn't guaranteed to be faster on all implementations, it is pretty useless.
When you say that it would obviously be faster, you make assumptions about implementation details that the standard tries to avoid.
Now, you can, with .extract(key) (since C++17).
https://en.cppreference.com/w/cpp/container/map/extract
This is because changing a key could affect the structure of an associative containers. Notably, std::map, which is a typically Red-Black tree, the tree structure mostly will be changed once you modify a key (e.g., rotating sub trees). In some data structures, even such dynamic changes are disallowed. So, it is challenging to expose such operation as a standard in an associative container.
Regarding the overhead you concerned, once you have value_type as a pointer or reference, the overhead of deleting/inserting a pair isn't too bad.
Well, honestly behind the screens it would result into an insert and delete operation anyhow, with the sole difference that the value-part will not be copied. While this seems to be your biggest concern, unless your object is very heavy on copying, in a large container, the update operation to re-stabilize the ordered container will be heavier anyhow.
Now this would require some important changes however going further than the associative containers, the two most important I can see are:
The std::pair class needs an update, because you must be able to update the key without creating a new pair (as this would also copy the value object).
You need an update function that removes a pair from the tree, calls the new logic from 1., and reinserts it.
I think the main problem lies with the first one, as std::pair is at the moment actually a very simple wrapper, and your proposal would delete that principle and add some (unnecessary) complexity to it. Also note that call 2 does not actually add new functionality but wraps a system the developer could manage himself easily through references and the like. If they would add all this wrapping to std, it would become a really huge bloated piece of library.
If you want this principle, you should look for more complex libraries (probably boost has some). Or you should simply use a reference/shared_ptr as your value_type.

Returning an iterator in a multi threaded environment, a good idea?

Is it a good idea to return an iterator on a list in an object that is used and shared in a multi threaded environment?
class RequestList
{
public:
RequestList::RequestList();
RequestList::~RequestList();
std::list<boost::shared_ptr<Request> >::iterator GetIterator();
int ListSize();
void AddItem(boost::shared_ptr<Request> request);
void RemoveItem(boost::shared_ptr<Request> request);
std::list<boost::shared_ptr<Request> > GetRequestsList();
boost::shared_ptr<Request> GetRequest();
private:
std::list<boost::shared_ptr<Request> > requests;
std::list<boost::shared_ptr<Request> >::iterator iter; //Iterator
boost::mutex listmtx;
};
std::list<boost::shared_ptr<Request> >::iterator RequestList::GetIterator()
{
return this->iter;
}
USE:
RequestList* requests;
In some thread (may be used again in other threads)
std::list<boost::shared_ptr<Request> >::iterator iter = requests->GetIterator();
Or would it be smarter to just create an iterator for that list each time and use it locally within each thread?
No it is usually not a good idea to share an iterator across threads. There are a couple of ways to make sure you don't get in trouble.
First off, an iterator is a light-weight object which is fast to construct and takes up very little memory. So you should not be concerned about any performance-issues. Just create an instance whenever you need one.
That said you do have to make sure that your list is not altered when you are iterating. I see you have a boost::mutex on in your class. Locking that will be perfect for ensuring that you don't get any problems when iterating.
A different and equally valid way of handling these situations is to simply copy the internal list and iterate that. This is a good solution if you require that the list is continually updated and you don't want other threads waiting. Of course it takes up a bit more memory, but since you are storing smart pointers, it will hardly be anything at all.
Depends how the list is used, but from what you've shown it looks wrong. The iterator becomes invalid if the element it refers to is removed: the element in this case being the shared_ptr object in the list.
As soon as you release the mutex, I guess some other thread could come along and remove that element. You haven't shown code that does it, but if it can happen, then iterators shouldn't "escape" the mutex.
I assume this is a "self-synchronizing" container, since the mutex is private and there's nothing in the API to lock it. The fundamental difficulty with such things, is that it's not thread-safe to perform any kind of iteration on them from the outside. It's easy enough to provide a thread-safe queue, that supports:
adding an element,
removing an element by value,
removing the head and returning a copy of its value.
Beyond that, it's harder to provide useful basic operations, because almost anything that manipulates the list in any interesting way needs to be done entirely under the lock.
By the looks of things, you can copy the list with GetRequestsList, and iterate over the copy. Not sure whether it will do you any good, since the copy is instantly out of date.
Accessing the list via iterators in multiple threads where the main list itself is not locked is dangerous.
There's no guarantee what state the list will be as you do things in with the iterators in different threads (for example, one thread could happily iterate through and erase all the items, what will the other thread - who's also iterating, see?)
If you are going to work on the list in multiple threads, lock the whole list first, then do what you need to. Copying the list is an option, but not optimal (depending on the size of your list and how fast it's updated). If locking becomes a bottle neck, re-think your architecture (list per thread for example?)
Each thread that calls the GetIterator function will get its own copy of the stored iterator in the list.
As a std::list<>::iterator is a bi-directional iterator, any copies you make are completely independent of the source. If one of them changes, this will not be reflected in any of the other iterators.
As for using iterator in a multi-threaded environment, this is not that different from a single-threaded environment. The iterator remains valid as long as the element it refers to is part of the container. You just have to take care of proper synchronization when accessing/modifying the container or its elements.
If the list is modified by one of your threads you might get into trouble.
But of course, you can take care of that by setting locks and ro- and rw-locks during modification. But since mutexes are the scurge of any high performance program, maybe you can make a copy of the list (or references) and keep the original list mutex- and lock-free? That would be the best way.
If you have the mutexes in place you only have to battle with the issues of a modifying a list while holding iterators on it as you would normally do anyway -- i.e. adding elements should be ok, deletion should be done "careful" but doing it on a list is probably less likely to explode as on a vector:-)
I would reconsider this design and would use a task-based approach. This way you don't need any mutexes.
For example use Intel TBB, which initializes a task pool internally. So you can easily implement a one-writer/multiple-readers concept.
First add all requests to your request container (a simple std::vector might be better suited in terms of cache locality and performance) and then do a parallel_for() over you request-vector BUT DON'T remove a request in your parallel-for() functor!
After the processing you can actually clear your request vector without any need of locking a mutex. That's it!
I hope I could help a bit.