Clearing std::map under a lock vs moving to a temp object - c++

I am using a std::map and have large number of elements in it. If I need to clear the map, I can just call clear() on it. It can take some time to clear and especially if done under a lock in a multi-threaded environment, it can block other calls. To avoid calling clear(), I tried this:
std::mutex m;
std::map<int, int> my_map; // the map which I want to clear
void func()
{
std::map<int, int> temp_map;
{
std::lock_guard<std::mutex> l(m);
temp_map = std::move(my_map);
}
}
This will move my_map to temp_map under the lock which will empty it out. And then once the func ends, the temp_map will be destroyed.
Is this a better way to do this to prevent taking the lock for a long time? Are there any performance hits?

I would recommend using swap instead of move. A moved object is not guaranteed to be actually empty or even usable. But with swap and a freshly created object you are sure of the results:
void func()
{
std::map<int, int> temp_map;
using std::swap;
{
std::lock_guard<std::mutex> l(m);
swap(my_map, temp_map);
}
}

Related

Should mutex be used when inserting element in vectors?

I know that need mutex when try to delete an element from a vector.
so, I wrote a sample code to check this.
class Test
{
public:
Test(int idx) : m_index(idx) {}
int m_index = { -1 };
int m_count = { 0 };
};
std::vector<std::unique_ptr<Test>> m_vec;
std::mutex m_mutex;
void foo1() // print element data
{
while (true)
{
std::unique_lock ulock(m_mutex);
for (auto& e : m_vec)
{
e->m_count++;
printf("%d : Count : %d\n", e->m_index, e->m_count);
}
ulock.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(5));
}
}
void foo2() // Only insert element
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dis(0, 99);
while (true)
{
int t = dis(gen);
if (t >= 0 && t < 10)
{
//std::unique_lock ulock(m_mutex);
m_vec.push_back(std::make_unique<Test>(m_vec.size()));
}
std::this_thread::sleep_for(std::chrono::milliseconds(t));
}
}
void foo3() // Only remove element
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dis(0, 99);
while (true)
{
int t = dis(gen);
if (t >= 0 && t < 10)
{
std::unique_lock ulock(m_mutex);
if (m_vec.empty() == false)
m_vec.erase(m_vec.begin());
}
std::this_thread::sleep_for(std::chrono::milliseconds(t));
}
}
int main()
{
m_vec.push_back(std::make_unique<Test>(1));
m_vec.push_back(std::make_unique<Test>(2));
m_vec.push_back(std::make_unique<Test>(3));
std::thread t1(foo1);
std::thread t2(foo2);
std::thread t3(foo3);
t1.join();
t2.join();
t3.join();
return 0;
}
If I proceed with erase() without using mutex, segment fault almost immediately occurred.
So I used mutex for the erase() routine, which seemed to work normally.
About 10 minutes later, however, a nullptr exception occurred when referring to e in the foo1() function.
Q1. push_back inserts data at the end. But why does the NULL error occur at the middle point? ex. vector size: 521, error index: 129)
Q2. When using ordered containers such as vector and deque, do I need mutex in insert functions?
Q3. What about Unordered containers? (like, unorderd_map)
(Remove mutex from insert and operate without any problem for about 20 minutes.)
foo2() is accessing/modifying the vector outside of the mutex lock. As such, foo1() and/or foo3() (which do use the mutex) are able to modify the vector at the same time as foo2(). That is undefined behavior.
When push_back inserts data at the end, why does the middle point (ex. vector size: 521, error index: 129) Null point error occur?
Pushing a new element into a vector may require it to reallocate its internal array, thus moving all of the existing elements to a new memory block. You are doing that in foo2() without the protection of the mutex lock, so it is possible that the elements which foo1() and foo3() are accessing may disappear behind their backs unexpectedly.
Erasing elements from a vector will not reallocate the internal array, but it may still shift elements around within the array's existing memory.
When using ordered containers such as vector and deque, do I need mutex in insert and delete functions?
Yes. All standard containers are not thread-safe, so modifications must be serialized when used across thread boundaries.
What about Unordered containers? (like, unorderd_map) (Remove mutex from insert and operate without any problem for about 20 minutes.)
Same thing.

How to make function thread safe

This is the code where i would be inserting values in a unordererd map and would also query those values at regular intervals.
class MemoryMap
{
private:
std::unordered_map<std::string, std::string> maps_;
std::mutex mutex_;
public:
void AddMap(std::string key, std::string value);
std::string GetMap(std::string key);
void PrintMemoryMap(std::string key);
};
void MemoryMap::AddMap(std::string key, std::string value)
{
std::unique_lock<std::mutex> lock(mutex_);
maps_[key] = value;
}
std::string MemoryMap::GetMap(std::string key)
{
std::unique_lock<std::mutex> lock(mutex_);
if (maps_.find(key) == maps_.end())
return "";
return maps_.at(key);
}
I would be using this object in two different threads and i want when insertion would be happening through AddMap function than GetMap function should wait for the insertion to finish. Also GetMap function would be called concurrently.
Is my current code sufficient to address this issue ?
It is sufficient. The mutex lock guarantees at most one thread get call get or set at the same time.
However, your code might be not optimized if you want to achieve concurrent reads. In C++, unordered_map is a container, which has thread safety like this: https://en.cppreference.com/w/cpp/container#Thread_safety Two threads can safely call get at the same time because it is a constant function, if no thread is modifying the container.

How to individually lock unordered_map elements in C++

I have an unordered_map that I want to be accessible by multiple threads but locking the whole thing with a mutex would be too slow.
To get around this I put a mutex in each element of the unordered_map:
class exampleClass{
std::mutex m;
int data;
};
std::unordered_map<int,exampleClass> exampleMap;
The issue is I'm unable to safely erase elements, because in order to destroy a mutex it must be unlocked but if it's unlocked then another thread could lock it and be writing to or reading the element during destruction.
unordered_map is not suitable for fine-grained parallelism. It is not legal
to add or remove elements without ensuring mutual exclusion during the process.
I would suggest using something like tbb::concurrent_hash_map instead, which will result in less lock contention than locking the map as a whole. (There are other concurrent hash table implementations out there; the advantage of TBB is that it's well-supported and stable.)
#Sneftel's answer is good enough.
But if you insist on using std::unordered_map, I suggest you two use one mutex to protect the insertion/deletion of the map, and another mutex for each element for modifying the element.
class exampleClass{
std::mutex m;
int data;
};
std::unordered_map<int,exampleClass> exampleMap;
std::mutex mapLock;
void add(int key, int value) {
std::unique_lock<std::mutex> _(mapLock);
exampleMap.insert({key, value});
}
void delete(int key) {
std::unique_lock<std::mutex> _(mapLock);
auto it = exampleMap.find(key);
if (it != exampleMap.end()) {
std::unique_lock<std::mutex> _1(it->m);
exampleMap.erase(it);
}
}
These should perform better for a big lock on the whole map if delete is not a frequent operation.
But be careful of these kinds of code, because it is hard to reason and to get right.
I strongly recommend #Sneftel's answer.
You have the following options:
Lock the entire mutex
Use a container of shared_ptr so the actual class can be modified (with or without a mutex) unrelated to the container.

is it ok to access value(entry in thread safe map) pointed by pointer inside non-thread safe container?

For example,
// I am using thread safe map from
// code.google.com/p/thread-safe-stl-containers
#include <thread_safe_map.h>
class B{
vector<int> b1;
};
//Thread safe map
thread_safe::map<int, B> A;
B b_object;
A[1] = b_object;
// Non thread safe map.
map<int, B*> C;
C[1] = &A[1].second;
So are following operations still thread safe?
Thread1:
for(int i=0; i<10000; i++) {
cout << C[1]->b1[i];
}
Thread2:
for(int i=0; i<10000; i++) {
C[1]->b1.push_back(i);
}
Is there any problem in the above code? If so how can I fix it?
Is it OK to access value(entry in thread safe map) pointed by pointer inside non-thread safe container?
No, what you are doing there is not safe. The way your thread_safe_map is implemented is to take a lock for the duration of every function call:
//Element Access
T & operator[]( const Key & x ) { boost::lock_guard<boost::mutex> lock( mutex ); return storage[x]; }
The lock is released as soon as the access function ends which means that any modification you make through the returned reference has no protection.
As well as being not entirely safe this method is very slow.
A safe(er), efficient, but highly experimental way to lock containers is proposed here: https://github.com/isocpp/CppCoreGuidelines/issues/924
with source code here https://github.com/galik/GSL/blob/lockable-objects/include/gsl/gsl_lockable (shameless self promotion disclaimer).
In general, STL containers can be accessed from multiple threads as long as all threads either:
read from the same container
modify elements in a thread safe manner
You cannot push_back (or erase, insert, etc.) from one thread and read from another thread. Suppose that you are trying to access an element in thread 1 while push_back in thread 2 is in the middle of reallocation of vector's storage. This might crash the application, might return garbage (or might work, if you're lucky).
The second bullet point applies to situations like this:
std::vector<std::atomic_int> elements;
// Thread 1:
elements[10].store(5);
// Thread 2:
int v = elements[10].load();
In this case, you're concurrently reading and writing an atomic variable, but the vector itself is not modified - only its element is.
Edit: using thread_safe::map doesn't change anything in you're case. While the modifying the map is ok, modifying its elements is not. Putting std::vector in a thread-safe collection doesn't automagically make it thread-safe too.

Updating cache without blocking

I currently have a program that has a cache like mechanism. I have a thread listening for updates from another server to this cache. This thread will update the cache when it receives an update. Here is some pseudo code:
void cache::update_cache()
{
cache_ = new std::map<std::string, value>();
while(true)
{
if(recv().compare("update") == 0)
{
std::map<std::string, value> *new_info = new std::map<std::string, value>();
std::map<std::string, value> *tmp;
//Get new info, store in new_info
tmp = cache_;
cache_ = new_cache;
delete tmp;
}
}
}
std::map<std::string, value> *cache::get_cache()
{
return cache_;
}
cache_ is being read from many different threads concurrently. I believe how I have it here I will run into undefined behavior if one of my threads call get_cache(), then my cache updates, then the thread tries to access the stored cache.
I am looking for a way to avoid this problem. I know I could use a mutex, but I would rather not block reads from happening as they have to be as low latency as possible, but if need be, I can go that route.
I was wondering if this would be a good use case for a unique_ptr. Is my understanding correct in that if a thread calls get_cache, and that returns a unique_ptr instead of a standard pointer, once all threads that have the old version of cache are finished with it(i.e leave scope), the object will be deleted.
Is using a unique_ptr the best option for this case, or is there another option that I am not thinking of?
Any input will be greatly appreciated.
Edit:
I believe I made a mistake in my OP. I meant to use and pass a shared_ptr not a unique_ptr for cache_. And when all threads are finished with cache_ the shared_ptr should delete itself.
A little about my program: My program is a webserver that will be using this information to decide what information to return. It is fairly high throughput(thousands of req/sec) Each request queries the cache once, so telling my other threads when to update is no problem. I can tolerate slightly out of date information, and would prefer that over blocking all of my threads from executing if possible. The information in the cache is fairly large, and I would like to limit any copies on value because of this.
update_cache is only run once. It is run in a thread that just listens for an update command and runs the code.
I feel there are multiple issues:
1) Do not leak memory: for that never use "delete" in your code and stick with unique_ptr (or shared_ptr in specific cases)
2) Protect accesses to shared data, for that either using locking (mutex) or lock-free mecanism (std::atomic)
class Cache {
using Map = std::map<std::string, value>();
std::unique_ptr<Map> m_cache;
std::mutex m_cacheLock;
public:
void update_cache()
{
while(true)
{
if(recv().compare("update") == 0)
{
std::unique_ptr<Map> new_info { new Map };
//Get new info, store in new_info
{
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
}
}
Note: I don't like update_cache() being part of a public interface for the cache as it contains an infinite loop. I would probably externalize the loop with the recv and have a:
void update_cache(std::unique_ptr<Map> new_info)
{
{ // This inner brace is not useless, we don't need to keep the lock during deletion
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
Now for the reading to the cache, use proper encapsulation and don't leave the pointer to the member map escape:
value get(const std::string &key)
{
// lock, fetch, and return.
// Depending on value type, you might want to allocate memory
// before locking
}
Using this signature you have to throw an exception if the value is not present in the cache, another option is to return something like a boost::optional.
Overall you can keep a low latency (everything is relative, I don't know your use case) if you take care of doing costly operations (memory allocation for instance) outside of the locking section.
shared_ptr is very reasonable for this purpose, C++11 has a family of functions for handling shared_ptr atomically. If the data is immutable after creation, you won't even need any additional synchronization:
class cache {
public:
using map_t = std::map<std::string, value>;
void update_cache();
std::shared_ptr<const map_t> get_cache() const;
private:
std::shared_ptr<const map_t> cache_;
};
void cache::update_cache()
{
while(true)
{
if(recv() == "update")
{
auto new_info = std::make_shared<map_t>();
// Get new info, store in new_info
// Make immutable & publish
std::atomic_store(&cache_,
std::shared_ptr<const map_t>{std::move(new_info)});
}
}
}
auto cache::get_cache() const -> std::shared_ptr<const map_t> {
return std::atomic_load(&cache_);
}