Assume that one thread calls only the following functions continuously.
here, insert_data check whether a key exists in std::unordered_map, and if it does not exist, insert_data call a function that adds a new key and modifies its values.
void insert_data(int key, int value, std::unordered_map<int, std::vector<int>>& my_map)
{
if (my_map.find(key) == my_map.end())
{
my_map[key] = std::vector<int>();
}
my_map[key].push_back(value);
}
In another thread, it iterate over std::unordered_map over and over again.
void iteration(std::unordered_map<int, std::vector<int>>& my_map)
{
for (auto& [key, value] : my_map)
{
std::cout<<"key : "<<key<<" value : "<<value<<std::endl;
}
}
Is the shared my_map thread safe if each of the above functions is executed in only one thread?
No, that is not safe. Operations that change the size of an STL container are never thread-safe.
Also your insert can be made more efficient:
void insert_data(int key, int value,
std::unordered_map<int, std::vector<int>>& my_map)
{
my_map[key].push_back(value);
}
The [key] operator creates the value automatically if it is not present. Even if your code wants to know whether it is a new entry, you can do better like this:
void insert_data(int key, int value,
std::unordered_map<int, std::vector<int>>& my_map)
{
auto inserted = my_map.emplace(key, std::vector<int>{});
inserted.first->second.push_back(value);
bool new_entry = inserted.second;
}
This avoids the duplicate lookup. It constructs a temporary vector of zero size but that is cheap.
Simple fix
The simplest solution is to protect the whole thing with a mutex.
class Dict
{
std::mutex mutex;
std::unordered_map<int, std::vector<int>> map;
public:
void insert_data(int key, int value)
{
std::lock_guard<std::mutex> lock(mutex);
map[key].push_back(value);
}
void iteration()
{
std::lock_guard<std::mutex> lock(mutex);
for(const auto& key_values: map)
for(int value: key_values.second)
std::cout << "key : " << key_values.first
<< " value : " << value << '\n';
}
};
The main issue is that now the insertion can wait for an extended period until it can progress.
Buffered fix
To avoid these long latencies, we should decouple the two threads as much as possible. Something like this:
class Dict
{
std::unordered_map<int, std::vector<int>> map;
std::mutex deferred_mutex;
std::unordered_map<int, std::vector<int>> deferred;
public:
void insert_data(int key, int value)
{
std::lock_guard<std::mutex> lock(deferred_mutex);
deferred[key].push_back(value);
}
void iteration()
{
std::unordered_map<int, std::vector<int>> new_elements;
std::unique_lock<std::mutex> deferred_lock(deferred_mutex);
deferred.swap(new_elements);
deferred_lock.unlock();
for(auto& [key, new_values]: new_elements) {
std::vector<int>& values = map[key];
values.insert(values.end(), new_values.begin(),
new_values.end());
}
for(const auto& key_values: map)
for(int value: key_values.second)
std::cout << "key : " << key_values.first
<< " value : " << value << '\n';
}
};
Basically we keep new elements separately until they can be inserted later. Compared to the cost of doing IO, the extra work for the iteration() thread should be negligible.
Instead of having a second unordered_map, a vector<pair<int, int>> for the key-value pairs would likely be more efficient but that requires benchmarking and knowledge of how often keys are duplicates.
Related
I have a map data structure containing a string as key and multiple data type as values. I populated the map while instantiating it. The problem that I am facing is while I'm iterating through the map and try to have access to the value of each key I'm getting some errors. I believe the return value might need to be cast from variant to its real data type. I really don't know how to have access to it.
this is the definition of the map: map> mapToBeProcessed;
map<string,boost::variant<int,double, long long, string>> mapToBeProcessed;
for(auto &x: mapToBeProcessed)
{
if(ini.hasField(x.first))
{
b << x.first << x.second;
}
}
//
The issue is happening when I'm trying to access the value of the map: x.second
You can visit the variant, to apply a function to the active member.
struct stream_visitor {
using result_type = void;
template <typename T>
void operator()(T& t) { os << name << t; }
std::ostream & os;
std::string name;
}
map<string,boost::variant<int,double, long long, string>> mapToBeProcessed;
for(auto &x: mapToBeProcessed)
{
if(ini.hasField(x.first))
{
boost::apply_visitor(stream_visitor{ b, x.first }, x.second);
}
}
I have a class that has 3-4 data members of type std::map<string, vector<string>> which is used for caching the data. Its instance is created once and the data is filled in all the maps from a service call. I have getter functions that are used to access these maps. (this has some thread locking logic in it as well)
These getter functions gets called a lot of times and I am worried about the performance as the map objects are being copied a lot of times.
class A {
private:
map<string, vector<string>> m1;
map<string, vector<string>> m2;
map<string, vector<string>> m3;
public:
map<string, vector<string>> getm1() { return m1; }
map<string, vector<string>> getm2() { return m2; }
map<string, vector<string>> getm3() { return m3; }
}
class B {
B() { }
static A a;
map<string, vector<string>> getm1() { return a.m1; }
map<string, vector<string>> getm2() { return a.m2; }
map<string, vector<string>> getm3() { return a.m3; }
}
There are multiple times these getter functions get called from class B. Having intermediate knowledge on cpp, I know that the getter would return an entire map by value. But, would it be better to pass it by reference or using a shared pointer to the map like storing the member variables m1, m2, m3 as shared_ptr<map>
shared_ptr<map<string, vector<string>>> getm1() { return a.m1; }
Is this not a performance concern and will be taken care of, by the compiler ?
After reading a little bit about Return value optimization and understanding it a little bit, compilers can handle some optimizations. Is this a part of RVO ?
Thank you in advance.
Return references (&) to const maps.
class A
{
using Map_type = std::map<std::string, std::vector<std::string>>;
Map_type m1;
Map_type m2;
Map_type m3;
public:
const auto& getm1() const { return m1; }
const auto& getm2() const { return m2; }
const auto& getm3() const { return m3; }
};
That will allow the calling function "read only" access to the maps without paying the price of a copy.
For C++11 and lower, instead of auto as the return type , the return type has to to be declared on the function.
Also using is available only for C++11 and above, but for earlier compilers typedef has to be used.
class A
{
typedef std::map<std::string, std::vector<std::string>> Map_type;
Map_type m1;
Map_type m2;
Map_type m3;
public:
const Map_type& getm1() const { return m1; }
const Map_type& getm2() const { return m2; }
const Map_type& getm3() const { return m3; }
};
If you have internal locking then you can't just return references to your cached maps without causing race conditions. You also need to provide the caller with a means to lock the data from modification. If it is efficiency you are looking for then one design to consider is something like this:
class Cache
{
public:
using mutex_type = std::shared_timed_mutex;
using reading_lock = std::shared_lock<mutex_type>;
using writing_lock = std::unique_lock<mutex_type>;
using map_type = std::map<std::string, std::vector<std::string>>;
reading_lock lock_for_reading() const { return reading_lock{mtx}; }
writing_lock lock_for_writing() { return writing_lock{mtx}; }
map_type const& use() const { return m; }
private:
void update_map()
{
// protect every update with a writing_lock
auto lock = lock_for_writing();
// safely update the cached map
m["wibble"] = {"fee", "fie", "foe", "fum"};
}
mutable mutex_type mtx;
map_type m = {{"a", {"big", "wig"}}, {"b", {"fluffy", "bunny"}}};
};
int main()
{
Cache cache;
{ // start a scope just for using the map
// protect every access with a reading_lock
auto lock = cache.lock_for_reading();
// safely use the cached map
for(auto const& s: cache.use().at("a"))
std::cout << s << '\n';
} // the lock is released here
// ... etc ...
}
By making the locking available to the caller you let them protect the data from race conditions while they are reading it. By using both read & write locks you gain performance because you know the callers will not be modifying the cached data.
Internally, when you update the cached maps, you need to use the writing_lock to make sure no one else is reading them while they are being updated.
For efficiency's sake you probably want a separate mutex for each map but it depends on the specific dynamics of your situation.
Note: This solution puts responsibility on the caller to correctly lock the data. A more robust (and complex) solution is suggested here: An idea for the GSL to make multithreading code safer with an example implementation here: gsl_lockable.
I've implemented an LRU cache (code) that I would like to use for a multithreaded matching problem with N elements and full N^2 (all pairs) matching. Ideally, I would just get a reference to each element directly from the cache to save memory.
The time is takes to match two elements (lets call them A and B) can greatly vary, and I am worried that if one pair of elements takes a long time to match then another thread (that is working very fast and processing many pairs) will cause either A or B to be evicted from the cache, making the references invalid.
A simple solution is to just not use references, but I'm wondering if there is a better way to ensure that elements won't be evicted if they are "currently used" or have a reference to them?
To avoid evicting objects that are in use it is possible to use the reference-counting functionality of std::shared_ptr. Consider the following implementation:
#include <iostream>
#include <string>
#include <memory>
#include <map>
#include <algorithm>
template <typename K, typename V> class cache
{
public:
cache() {}
static const constexpr int max_cache_size = 2;
std::shared_ptr<V> getValue(const K& k)
{
auto iter = cached_values.find(k);
if (iter == cached_values.end()) {
if (cached_values.size() == max_cache_size) {
auto evictIter =
std::find_if(cached_values.begin(), cached_values.end(),
[](const auto& kv) { return kv.second.second.unique(); });
if (evictIter == cached_values.end()) {
std::cout << "Nothing to evict\n";
return nullptr;
}
cached_values.erase(evictIter);
}
static V next;
iter = cached_values.insert(std::make_pair(k, std::make_pair(++next, nullptr))).first;
iter->second.second = std::shared_ptr<V>(&iter->second.first, [](const auto&) {});
}
return iter->second.second;
}
std::map<K, std::pair<V, std::shared_ptr<V>>> cached_values;
};
int main()
{
cache<int, int> c;
std::cout << *c.getValue(10) << "\n";
std::cout << *c.getValue(20) << "\n";
std::cout << *c.getValue(30) << "\n";
auto useOne = c.getValue(10);
auto useTwo = c.getValue(20);
std::cout << *c.getValue(20) << "\n"; // We can use stuff that is still in cache
std::cout << c.getValue(30); // Cache is full, also note no dereferencing
}
Basically, as long as anyone outside the cache uses the returned value, the std::shared_ptr::unique will return false, making the cache entry non-evictable.
Live example
Consider this situation:
void doSmth1(std::map<int,int> const& m);
void doSmth2(std::map<int,int> const& m) {
std::map<int,int> m2 = m;
m2[42] = 47;
doSmth1(m2);
}
The idea is that doSmth2 will call doSmth1 and forward the map it received from its caller. However, it has to add one additional key-value pair (or override it if it is already there). I would like to avoid copying the whole thing just to pass an additional value to doSmth1.
You can't do that with the standard map. But if your problem is that specific, you might consider passing the new element separately:
void doSmth1(std::map<int, int> const & m, int newkey, int newvalue);
void doSmth2(std::map<int, int> const & m)
{
doSmth1(m, 42, 47);
}
Update: If you really just want one map, and copying the map is out of the question, then here's how you can implement #arrowdodger's suggestion to make a temporary modification to the original map:
void doSmth2(std::map<int, int> & m)
{
auto it = m.find(42);
if (it == m.end())
{
m.insert(std::make_pair(42, 49));
doSmth1(m);
m.erase(42);
}
else
{
auto original = it->second;
it->second = 49;
doSmth1(m);
it->second = original;
}
}
basically, I have the
map<std::string, int>
so if i have
foo 5
bar 10
jack 3
in the map, I want to display it (notice the reverse order)
bar 10
foo 5
jack 3
And every time it is updated, I want iterate through all the elements, cout them, sorted by value. What is the good way to implement that? should I provide a comparator to the constructor?
I want to note that values in the map will be updated at least 100 million times, so efficiency is crucial, where as extra-space is no problem
Please no Boost solutions...thx
struct keyval_t { std::string key; int val; };
int operator<(const keyval_t &a, const ketval_t &b)
{ return a.val<b.val || (a.val==b.val && a.key<b.key); }
Then you need one map and one set:
map<std::string, int>; set<keyval_t>;
On update, you need to look up the map first to determine the key-value pair and then update both map and set. On printing, you just iterate through the set. In terms of theoretical time complexity, this is optimal. It doubles the memory, though. Does this meet your goal?
To reduce memory, you may consider the following:
map<std::string,uint64_t>; set<uint64_t>;
The value of the map (also the key of the set) is: (uint64_t)val<<32|counter, where counter is something that differentiates identical values. For example, whenever you insert a key, increase the counter by 1. You do not need to update the counter when you update the value. If you do not like uint64_t, use pair<int,int> instead. This solution is also faster as it avoids comparisons between strings.
If you want a performant map sorted by both key and value, you want Boost MultiIndex, it gets updated (resorted) on every update (which you have to do manually) and has a good documentation.
The previous responses have the inconvenience not to take into account the initial requirements (the key is std::string and the value is int).
EDITED: following the comments, I suppose presenting it directly with a Bimap is better :)
So here we go, right in!
#include <boost/bimap.hpp>
class MyMap
{
struct name {};
struct value {};
typedef boost::tagged<name, std::string> tagged_name;
typedef boost::tagged<value, int> tagged_value;
// unordered_set_of: guarantees only unicity (not order)
// multi_set_of: guarantees only order (not unicity)
typedef boost::bimap< boost::unordered_set_of< tagged_name >,
boost::multi_set_of< tagged_value,
std::greater< tagged_value >
>
> impl_type;
public:
// Redefine all usual types here
typedef typename impl_type::map_by<name>::const_iterator const_iterator;
typedef typename impl_type::value_type value_type;
// Define the functions you want
// the bimap will not allow mutators because the elements are used as keys
// so you may want to add wrappers
std::pair< iterator, bool > insert(const value_type & x)
{
std::pair< iterator, bool > result = m_impl.insert(x);
if (result.second) this->display();
return result;
} // insert
iterator insert(iterator position, const value_type & x)
{
iterator result = m_impl.insert(x);
this->display();
return result;
} // insert
template< class InputIterator >
void insert(InputIterator first, InputIterator last)
{
m_impl.insert(first, last);
this->display();
} // insert
private:
void display() const
{
// Yeah I know about std::for_each...
typedef typename impl_type::map_by<value>::const_iterator const_it;
for (const_it it = m_impl.begin(), end = m_impl.end(); it != end; ++it)
{
// Note the inversion of the 'second' and 'first',
// we are looking at it from the right
std::cout << it->second << " " << it->first << std::endl;
}
}
impl_type m_impl;
}; // class MyMap
Here you go.
I strongly suggest that you consult the bimap documentation though. There are a lot of possibilities for storing (set_of, unordered_set_of, unconstrained_set_of, the muli variants, the list_of variant...) so there is probably one that could do what you want.
Then you also have the possibility to just sort each time you display:
#include <set>
#include <map>
// Just use a simple std::map<std::string,int> for your impl_type
// Mutators are allowed since the elements are sorted each time you display
struct Comparator
{
bool operator(const value_type& lhs, const value_type& rhs) const
{
return lhs.second < rhs.value;
}
};
void display() const
{
typedef std::multi_set<value_type, Comparator> sort_type;
sort_type mySet;
std::copy(m_impl.begin(), m_impl.end(), std::inserter(mySet, mySet.end()));
for (sort_type it = mySet.begin(), end = mySet.end(); it != end; ++it)
{
std::cout << it->first<< " " << it->second << std::endl;
}
}
It should be easier to understand my point now :)
Well, you have to sort by key. The easiest way to "sort by value" is to use a multimap with the key and value switched. So here's a way to do that (note -- i don't have access to a compiler right now, so if it doesn't compile, I'm sorry):
#include <algorithm>
#include <functional>
#include <map>
#include <string>
#include <utility>
typedef std::multimap<int, std::string, std::greater<int> > MapType;
struct MapKeyOutput
{
void operator()(const MapType::value_type& val)
{
std::cout << val.second << " " << val.first << "\n";
}
}
void display_insert(MapType& m, const std::string& str, int val)
{
m.insert(std::make_pair(val, str));
std::for_each(m.begin(), m.end(), MapKeyOutput());
}
int main()
{
MapType m;
display_insert(m, "Billy", 5);
display_insert(m, "Johnny", 10);
}
You could also make a map class that uses a multimap internally, I just didn't want to type it out. Then display_insert would be some member function instead. This should demonstrate the point, though. Points of interest:
typedef std::multimap<int, std::string, std::greater<int> > MapType;
Note the comparator is greater<int> to sort descending. We're using a multimap so more than one name can have the same number.
struct MapKeyOutput
{
void operator()(const MapType::value_type& val)
{
std::cout << val.second << " " << val.first << "\n";
}
}
This is a function object to output one element in the map. second is output before first so the order is what you want.
std::for_each(m.begin(), m.end(), MapKeyOutput());
This applies our function object to every element in m.