I have two threads where one thread "A" inserts a key X to the map and the same key X is being modified by that thread "A" frequently.
At a particular point the thread "A" completes modifications to that key X and then thread "B" will read the key "X" and delete the key "X" from the map.
While the thread "B" reads and deletes the map , the thread "A" will insert and write some other keys in the map(not the same key X) concurrently.
In this case , does the map needs to be synchronized? As the thread "B" is sure that the key "X" is completely modified by thread "A" and no more concurrent modifications will be made for that key "X".
Yes, you need synchronization.
Inserting and deletion can change internal state of the map class that can overlap with other similar operations (even if they are for different keys).
While thread A updates the object you don't need to lock the map. Map guarantees that iterators and object pointers are stable under insertions/deletions so your object won't be touched.
Related
I remember that someone tell me, if I had a map,which already has
"key1" and "key2",then using thread1 to read key1, and thread2 to
write "key2"(only to change "key2"'s value,not change "key2" to
"key3").That will not cause any threat or mistake.
But if now the map only contain "key1" , using thread1 to read key1,
and thread2 to insert "key2".That behavior will cause the change of
hash structure,so I need to add a lock.
Is that correct?
By the way,what about the unordered_map?Is it still the same?
Is that correct?
Yes, it is correct. Both for ordered and unordered standard maps.
Although, there is no "hash structure" in an ordered map, and it is not possible to change the key of an element in either container.
I have std::set having large number unique objects as its elements.
In the main thread of program:
I take some objects from the set
Assign data to be processed to each of them
Remove those objects from set
And finally pass the objects to threads in threadpool for processing
Once those threads finish processing objects, they adds them back to the set. (So that in the next iteration, main thread can again
assigns next batch of data to those objects for processing)
This arrangement works perfect. But if I encounter error while adding back object to the set (for example, std::set.insert() throws bad_alloc) then it all goes on toss.
If I ignore that error and proceed, then there is no way for the object to get back in the processing set and it remains out of the program flow forever causing memory leaks.
To address this issue I tried to not to remove object from set. Instead, have a member flag that indicates the object is 'being processed'. But in that case the problem is, main thread encounters 'being processed' objects again and again while iterating through all elements of set. And it badly hampers performance (Number of objects in set are quite large).
What are better alternatives here?
Can std::list be used instead of std::set? List will not have bad_alloc problem while adding back element, as it just needs to assign pointers while adding element to list. But how can we make list elements unique? If at all we achieve it, will it be efficient as std::set?
Instead of removing and adding back elements to the std::set, is there any way to move element to the start or end of the set? So that unprocessed objects and processed will accumulate together towards start and end of the set.
Any other solution please?
I am not very good at data structures, so this might be very silly question. I am looking for a way to implement a hybrid behavior of queue + maps.
I am currently using tbb::concurrent_bounded_queue (documented at Intel's developer zone) from www.threadingbuildingblocks.org in a multithreaded single producer single consumer process. The queue has market data quote objects and the producer side of the process is actually highly time sensitive, so what I need is a queue that is keyed on a market data identifier such as USDCAD, EURUSD. The Value points (through unique_ptr) to most latest market data quote that I received for this key.
So, let us say my queue has 5 elements for 5 unique identifiers and suddenly we get updated market data quote for the identifier at 3rd position in the queue, then I just store the most latest value and discard the value I previously had. So, essentially I just move my unique_ptr to the new market data quote for this key.
It's like it is similar to concurrent_bounded_queue<pair<string, unique_ptr<Quote>>> but is keyed on the first element of the pair.
I am not sure if this is already available in a third-party library (may be tbb itself) or what it is called if it is a standard data structure.
I would highly appreciate any help or guidance on this.
Thanks.
First, observe that we can easily write...
int idn_to_index(idn); // map from identifier to contiguous number sequence
...it doesn't matter much if that uses a std::map or std::unordered_map, binary search in a sorted std::vector, your own character-by-character hardcoded parser....
Then the producer could:
update (using a mutex) a std::vector<unique_ptr<Quote>> at [idn_to_index(idn)]
post the index to concurrent_bounded_queue<int>
The consumer:
pop an index
compares the pointer in std::vector<unique_ptr<Quote>> at [index] to its own array of last-seen pointers, and if they differ process the quote
The idea here is not to avoid having duplicate identifier-specific indices in the queue, but to make sure that the stalest of those still triggers processing of the newest quote, and that less-stale queue entries are ignored harmlessly until the data's genuinely been updated again.
TBB provides
concurrent_undordered_map: no concurrent erase, stable iterators, no element access protection;
concurrent_hash_map: has concurrent erase, concurrent operations invalidate iterators, per-element access management via 'accessors'
So, if the question
"It's like it is similar to concurrent_bounded_queue<pair<string, unique_ptr<Quote>>> but is keyed on the first element of the pair" means suggest a corresponding concurrent associative map container, these two are at your service. Basically, you have to choose between the ability to erase identifiers concurrently (hash_map) and the ability to traverse concurrently across all the elements (unordered_map). concurrent_hash_map also simplifies synchronization of accesses to the elements which looks useful for your case.
I was able to solve this problem as below:
I use a queue and a hashmap both from tbb library. Now, I push my unique identifiers on the queue and not the Quote's. My hashmap has my unique identifier as key and Quote as value
So, when I receive a Quote I iterate through the queue and check whether the queue contains that identifier, if it does, then I insert the corresponding Quote directly into the hashmap and do not add the unique identifier on the queue. If it does not, then I push the identifier on the queue and corresponding Quote in hashmap. This, ensures that my queue always as unique set of identifiers and my hashmap has the most latest Quote available for that identifier.
On the consumer side, I pop the queue to get my next identifier and get the Quote for that identifier from the hashmap.
This works pretty fast. Please let me know in case I am missing any hidden issues with this.
I am using a map<int, queue<string>>, where int refers to the source of a message, and the queue holds the message. One thread pushes messages into the queue, another thread pushes them out of the queue.
This is a client-server program - when the client sends a message, the message gets pushed into the queue.
I am currently using (pseudo code)
/*receive message in thread 1*/
map<int, queue<string>> test_map;
int client_id = 2;
string msg = received_from_client(client_id);
testmap[client_id].push(msg);
/*process message in thread 2*/
string msg_to_process testmap[client_id].front();
test_map[client_id].pop();
if (testmap[client_id].empty())
{
testmap.erase(client_id);
}
I know from this question that the difference is that insert will not overwrite an existing key - does this apply when I am pushing things into queues? Is it safer to use insert, or is what I'm doing with [] sufficient?
Also - while the system should only have one message in the queue at any one time, I am making expansion allowances by using map<int, queue> instead of using map<int,string>.
edit: I have a question about multiple threading as well - what happens when thread 1 attempts to insert into the map while thread 2 deletes the key because the queue is empty (after it has processed the message). Is that a quantitative answer to this, and does using [] or insert() help make it anymore threadsafe?
Queue's don't have keys or [] operators, so your first question can't really be answered. You insert into queue's by pushing onto the back. If there are elements there, it will go after them. You read off a queue by popping things off of the front, if there are any. You don't read or write anywhere other than that.
As for maps, like you said, insert will add a new key-value pair if it does not exist already. It will not overwrite an existing key. Find will find a value if it exists already, but will not insert it if it doesn't. And then the [] operator does both, and also allows you to change existing elements. The documentation here is very good.
One thing to be aware of is that using the map's [] operator to read from the map will also insert a default valuetype element into the map with that key, and is probably not what you would expect when first looking at it.
std::map<int, int> myMap;
if(myMap[1] == 0) //[] create's a key-value pair <1,0>
cout << "This will output";
if(myMap.size() == 1)
cout << "This too";
As for the thread safety aspect, no STL containers are thread safe based on the standard. You need to add proper locking in your code to prevent exactly what you asked about. If 2 threads tried to read and write from a queue at the same time, it will almost definitely cause an error. I would google around about writing thread safe programs for general help on how to do that.
Basically, if two processes attempt to append to the same key at the same time, is there any chance that one will ever overwrite the other?
e.g.:
Process 1 appends "a" to the key "k"
Process 2 appends "b" to the key "k"
Are we guaranteed to have two characters (either "ab" or "ba") as the value after we perform these actions?
Yes, memcached does not do a read/write to append so concurrency is ensured