I have an unordered_map<int, float> localCostMap describing the costs (float) between node IDs (int). The calculation of the second value is quite complex, but due to the structure of graph (directed, acyclic, up to two parent nodes) I can save many calculations by pooling the maps for each node into another map like so:
unordered_map<int, shared_ptr<unordered_map<int, float>>> pooledMaps.
Once the values are written (into localCostMap), they do not get updated again, but the calculations required the map entries of the connected nodes, which may leed to lock-ups.
How can I make it so that I can read the values stored in the inner map while also safely adding new entries (e.g. { 3, 1.23 }? I'm new to multithreading and have tried to search for solutions, but the only results I got were older, despite reading that multithreading has improved much, particularly in C++20.
Thank you in advance!
Edit: As requested, here is a minimal working example. Of course, the full algorithm is more complex, considers the edge cases and enters them also for other applicable nodes (e.g. the result of comparing 5 & 7 also applies for 6 & 7).
// Example.h
#pragma once
#include <iostream>
#include <unordered_map>
#include <thread>
struct MyNode {
int id;
int leftNodeID;
int rightNodeID;
std::shared_ptr<std::unordered_map<int, float>> localCostMap; /* inherited from parent (left/right) nodes */
std::unordered_map<int, std::shared_ptr<std::unordered_map<int, float>>> pooledMaps;
MyNode() : id(0), leftNodeID(0), rightNodeID(0) { setLocalCostMap(); }
MyNode(int _id, int leftID, int rightID) :
id(_id), leftNodeID(leftID), rightNodeID(rightID) { setLocalCostMap(); }
void setLocalCostMap();
float calculateNodeCost(int otherNodeID);
};
// Example.cpp
#include "NodeMapMin.h"
MyNode nodes[8];
void MyNode::setLocalCostMap() {
if (leftNodeID == 0) { // rightNodeID also 0
localCostMap = std::make_shared<std::unordered_map<int, float>>();
}
else { // get map from connectednode if possible
auto poolmap = nodes[leftNodeID].pooledMaps.find(rightNodeID);
if (poolmap == nodes[leftNodeID].pooledMaps.end()) {
localCostMap = std::make_shared<std::unordered_map<int, float>>();
nodes[leftNodeID].pooledMaps.insert({ rightNodeID, localCostMap }); // [1] possible conflict
nodes[rightNodeID].pooledMaps.insert({ leftNodeID, localCostMap }); // [1] possible conflict
}
else { localCostMap = poolmap->second; }
}
}
float MyNode::calculateNodeCost(int otherNodeID) {
if (id > 0) {
std::cout << "calculateNodeCost for " << nodes[id].id << " and " << nodes[otherNodeID].id << std::endl;
}
float costs = -1.0f;
auto mapIterator = localCostMap->find(otherNodeID);
if (mapIterator == localCostMap->end()) {
if (id == otherNodeID) { // same node
std::cout << "return costs for " << id << " and " << otherNodeID << " (same node): " << 0.0f << std::endl;
return 0.0f;
}
else if (leftNodeID == 0 || nodes[otherNodeID].leftNodeID == 0) {
costs = ((float)(id + nodes[otherNodeID].id)) / 2;
std::cout << "calculated costs for " << id << " and " << otherNodeID << " (no connections): " << costs << std::endl;
}
else if (leftNodeID == nodes[otherNodeID].leftNodeID &&
rightNodeID == nodes[otherNodeID].rightNodeID) { // same connected nodes
costs = nodes[leftNodeID].calculateNodeCost(rightNodeID); // [2] possible conflict
std::cout << "return costs for " << id << " and " << otherNodeID << " (same connections): " << costs << std::endl;
return costs;
}
else {
costs = nodes[leftNodeID].calculateNodeCost(otherNodeID) +
nodes[rightNodeID].calculateNodeCost(otherNodeID) +
nodes[id].calculateNodeCost(nodes[otherNodeID].leftNodeID) +
nodes[id].calculateNodeCost(nodes[otherNodeID].rightNodeID); // [2] possible conflict
std::cout << "calculated costs for " << id << " and " << otherNodeID << ": " << costs << std::endl;
}
// [3] possible conflict
localCostMap->insert({ otherNodeID, costs });
nodes[otherNodeID].localCostMap->insert({ id, costs });
}
else {
costs = mapIterator->second;
std::cout << "found costs for " << id << " and " << otherNodeID << ": " << costs << std::endl;
}
return costs;
}
float getNodeCost(int node1, int node2) {
return nodes[node1].calculateNodeCost(node2);
}
int main()
{
nodes[0] = MyNode(0, 0, 0); // should not be used
nodes[1] = MyNode(1, 0, 0);
nodes[2] = MyNode(2, 0, 0);
nodes[3] = MyNode(3, 0, 0);
nodes[4] = MyNode(4, 0, 0);
nodes[5] = MyNode(5, 1, 2);
nodes[6] = MyNode(6, 1, 2);
nodes[7] = MyNode(7, 3, 4);
//getNodeCost(5, 7);
//getNodeCost(6, 7);
std::thread w1(getNodeCost, 5, 7);
std::thread w2(getNodeCost, 6, 7);
w1.join();
w2.join();
std::cout << "done";
}
I commented out the single-thread variant, but you can easily see the difference as the multi-threaded version already has more (unneccessary) comparisons.
As you can see, whenever two "noteworthy" comparisons take place, the result is added to localCostMap, which is normally derived from the connected two nodes. Thus, one insert is necessary for all nodes with these two connections (left & right).
I see at least 3 problematic point:
When initializing the node and inserting the pooled maps for the connected nodes: if two nodes with the same connections were to be added at the same time, they would both want to create and add the maps for the connected nodes. [1]
When calculating the values, another thread might already be doing it, thus leading to unneccessary calculations. [2]
When inserting the results into localCostMap (and by that also to the maps of the connected nodes). [3]
If you already have a std::shared_ptr to one of the inner maps it can be safely used, since, as you explained, once created it is never updated by any execution thread.
However since the outer map is being modified, all access to the outer map must be thread safe. None of the containers in the C++ library are thread safe, it is your responsibility to make them thread safe when needed. This includes threads that only access the outer map, since other execution threads might be modifying it. When something is modified all execution threads are required to use thread-safe access.
This means holding a mutex lock. The best way to avoid bugs that involve thread safety is to make it logically impossible to access something without holding a lock. And the most direct way of enforcing this would be the map to be wrapped as a private class member, with the only way to access it is to call a public method that grabs a mutex lock:
#include <unordered_map>
#include <memory>
#include <mutex>
#include <iostream>
class costs {
std::unordered_map<int, std::shared_ptr<std::unordered_map<int, float>
>> m;
std::mutex mutex;
public:
template<typename T>
auto operator()(T && t)
{
std::unique_lock lock{mutex};
return t(m);
}
};
int main() {
costs c;
// Insert into the map
c([&](auto &m) {
auto values=std::make_shared<std::unordered_map<int, float>>();
(*values)[1]=2;
m.emplace(1, values);
});
// Look things up in a map
auto values=c([]
(auto &m) -> std::shared_ptr<std::unordered_map<int, float>>
{
auto iter=m.find(1);
if (iter == m.end())
return nullptr;
return iter->second;
});
// values can now be used, since nothing will modify it.
return 0;
}
This uses some convenient features of modern C++, but can be implemented all the way back to C++11, with some additional typing.
The only way to access the map is to call the class's () operator which acquires a lock and calls the passed-in callable object, like a lambda, passing to it a reference to the outer map. The lambda can do whatever it wants with it, before returning, at which point the lock gets released.
It is not entirely impossible to defeat this kind of enforced thread safety, but you'll have to go out of your way to access this outer unordered map without holding a lock.
For completeness' sake you may need to implement a second () overload as a const class method.
Note that the second example looks up one of the inner maps and returns it, at which point it's accessible without any locks being held. Presumably nothing would modify it.
You should consider using maps of std::shared_ptr<const std::unordered_map<int, float> instead of std::shared_ptr<std::unordered_map<int, float>. This will let your C++ compiler enforce the fact that, presumably, once created these maps will never be modified. Like I already mentioned: the best way to avoid bugs is to make it logically impossible for them to happen.
Related
I wrote an Edge class like this:
struct Edge : public ::std::pair<int, int>
{
using ::std::pair<int, int>::pair;
int &src = first;
int &dst = second;
};
namespace std
{
template <>
struct hash<Edge>
{
std::size_t operator()(Edge const &x) const noexcept
{
return (x.src << 16) | x.dst;
}
};
}
So I can use src/dst to substitute first/second.
However, I found things go wrong when I use std::unordered_set.
Edge e(1, 2);
print_edge_debug_info(e);
using _set_t = std::unordered_set<Edge>;
_set_t s;
s.insert(e);
for(auto &&_e : s) {
print_edge_debug_info(_e);
}
/*
output:
(1, 2) [ 2,0x7ffca1c08724] [ 2,0x7ffca1c08724]
(1, 2) [ 2,0x2bb80fc] [ 2,0x7ffca1c08724]
*/
print_edge_debug_info:
inline void print_edge_debug_info(Edge const &edge)
{
std::cout << edge
<< " "
<< "[ " << edge.second << "," << &edge.second << "] "
<< " "
<< "[ " << edge.dst << "," << &edge.dst << "] "
<< std::endl;
}
The dst and second have the same address after the first Edge object being constructed. But If I put this Edge object into a std::unordered_set and fetch it from the set, the addresses of the result's dst and second are different.
Besides, the new second and the old second have different address. But the new dst and the old dst have the same address. Which means the new dst is the alias of the old second, not the alias of the new second.
It seems strange for me. I don't understand why this happened.
Is my way to make member aliases wrong? What's the correct way?
The following, smaller example demonstrates the problem with the shown approach.
Edge a{1, 2};
// ...
Edge b=a;
a's src and dst references refer to a.first, and a.second, as expected.
b's src and dst references ...also refer to a.first and a.second, for the simple reason that there is no valid reason for them to be anything else, here. That's what you're observing with your set.
A better idea that avoids unexpected surprises is to simply avoid using references in the first place:
int &src()
{
return first;
}
int src() const
{
return first;
}
(and the same with second). This also has the advantage of not requiring Edge to take up twice the memory, just to carry the references around. It is unlikely that a C++ compiler, even with a proper = will figure out how to optimize them out.
There are very good reasons for references to exist in C++ but they are not meant to be used to create aliases to class members. That's not what their semantics are designed for.
I have a basic problem with understanding what ostream is exactly. I know that it's a base class for the output stream, but I can't quite gasp when to use it and why to use it instead of just saying std::cout.
So here I have this example where I have to create a new class named stack with a pop() function (just as in the class already provided by C++).
Here list_node is a struct which consists of two elements: the key (which is an integer) and an interator which points to the next integer.
Definition of list_node (already given):
struct list_node {
int key;
list_node∗ next;
// constructor
list_node (int k, list_node∗ n)
: key (k), next (n) {}
};
and here is the definition of the class (already given as well):
class stack {
public:
void push (int value) {...}
...
private:
list_node∗ top_node;
};
and here's the part with which I'm having trouble with:
void print (std::ostream& o) const
{
const list_node* p = top_node;
while (p != 0) {
o << p->key << " "; // 1 5 6
p = p->next;
}
}
I don't understand why they are using ostream& o as function argument. Couldn't they've just taken the top_node as argument and used as well .next function on it (.next reads the next list_node) and then they could've just printed it with the std::cout function. Why is it better to do it the way they did?
Why is it better to do it the way they did?
I am not sure of your question, and not sure it is a better way.
Perhaps the intent was for flexibility. Here is an example from my app library:
When I declare a data attribute as an ostream
class T431_t
{
// ...
std::ostream* m_so;
// ...
I can trivially use that attribute to deliver a report to 'where-m_so-points'. In this app, there are several examples of *mso << ... being used. Here is the primary example.
inline void reportProgress()
{
// ...
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(14)
<< digiComma(m_N) << ")" << std::endl;
// ...
}
Note that in the class constructor (ctor), there is a default assignment for m_so to std::cout.
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
// ..
m_so (&std::cout), // ctor init - default
// ..
{
// ...
When the user selects the dual-thread processing option, which is a command line option to perform the app in about 1/2 the time by using both processors of my desktop, the reports can become hard to read if I allow the two independent output streams to intertwine (on the user screen). Thus, in the object instance being run by thread 2, m_so is set some something different.
The following data attribute captures and holds thread 2 output for later streaming to std::cout.
std::stringstream m_ssMagic; // dual threads use separate out streams
Thread 2 is launched and the thread sets it's private m_so:
void exec2b () // thread 2 entry
{
m_now = Clock_t::now();
m_so = &m_ssMagic; // m_so points to m_ssMagic
// ...
m_ssMagic << " execDuration = " << m_ssDuration.str()
<< " (b) " << std::endl;
} // exec2b (void)
While thread 1 uses std::cout, and thread 2 uses m_ssMagic, 'main' (thread 0) simply waits for the joins.
The join's coordinate the thread completion, typically about the same time. Main (thread 0) then cout's the m_ssMagic contents.
//...
// main thread context:
case 2: // one parameter: 2 threads each runs 1/2 of tests
{ // use two new instances
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 additional threads started here
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's - thread main (0) waits for each to complete
tA.join();
tB.join();
// tA outputs directly to std::cout
// tB captured output to T431_t::m_ssMagic.
// both thread 1 and 2 have completed, so ok to:
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
} break;
To be complete, here is
std::string ssMagicShow() { return (m_ssMagic.str()); }
Summary
I wrote the single thread application first. After getting that working, I searched for a 'simple' way to make use of the second core on my desktop.
As part of my first refactor, I a) added "std::ostream m_so" initialized to &std::cout, and b) found all uses of std::cout. Most of these I simply replaced with "*m_so". I then c) confirmed that I had not broken the single thread solution. Quite easy, and worked the first try.
Subsequent effort implemented the command line 'dual-thread' option.
I think this approach will apply to my next desktop, when budget allows.
And from an OOP standpoint, this effort works because std::ostream is in the class hierarchy of both std::cout and std::stringstream. Thus
"std::cout is-a std::ostream",
and
"std::stringstream is-a std::ostream".
So m_so can point to instance of either derived class, and provide virtual method 'ostream-access' to either destination.
I have a service which has outages in 4 different locations. I am modeling each location outages into a Boost ICL interval_set. I want to know when at least N locations have an active outage.
Therefore, following this answer, I have implemented a combination algorithm, so I can create combinations between elemenets via interval_set intersections.
Whehn this process is over, I should have a certain number of interval_set, each one of them defining the outages for N locations simultaneusly, and the final step will be joining them to get the desired full picture.
The problem is that I'm currently debugging the code, and when the time of printing each intersection arrives, the output text gets crazy (even when I'm using gdb to debug step by step), and I can't see them, resulting in a lot of CPU usage.
I guess that somehow I'm sending to output a larger portion of memory than I should, but I can't see where the problem is.
This is a SSCCE:
#include <boost/icl/interval_set.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
// Initializing data for test
std::vector<boost::icl::interval_set<unsigned int> > outagesPerLocation;
for(unsigned int j=0; j<4; j++){
boost::icl::interval_set<unsigned int> outages;
for(unsigned int i=0; i<5; i++){
outages += boost::icl::discrete_interval<unsigned int>::closed(
(i*10), ((i*10) + 5 - j));
}
std::cout << "[Location " << (j+1) << "] " << outages << std::endl;
outagesPerLocation.push_back(outages);
}
// So now we have a vector of interval_sets, one per location. We will combine
// them so we get an interval_set defined for those periods where at least
// 2 locations have an outage (N)
unsigned int simultaneusOutagesRequired = 2; // (N)
// Create a bool vector in order to filter permutations, and only get
// the sorted permutations (which equals the combinations)
std::vector<bool> auxVector(outagesPerLocation.size());
std::fill(auxVector.begin() + simultaneusOutagesRequired, auxVector.end(), true);
// Create a vector where combinations will be stored
std::vector<boost::icl::interval_set<unsigned int> > combinations;
// Get all the combinations of N elements
unsigned int numCombinations = 0;
do{
bool firstElementSet = false;
for(unsigned int i=0; i<auxVector.size(); i++){
if(!auxVector[i]){
if(!firstElementSet){
// First location, insert to combinations vector
combinations.push_back(outagesPerLocation[i]);
firstElementSet = true;
}
else{
// Intersect with the other locations
combinations[numCombinations] -= outagesPerLocation[i];
}
}
}
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
}
while(std::next_permutation(auxVector.begin(), auxVector.end()));
// Get the union of the intersections and see the results
boost::icl::interval_set<unsigned int> finalOutages;
for(std::vector<boost::icl::interval_set<unsigned int> >::iterator
it = combinations.begin(); it != combinations.end(); it++){
finalOutages += *it;
}
std::cout << finalOutages << std::endl;
return 0;
}
Any help?
As I surmised, there's a "highlevel" approach here.
Boost ICL containers are more than just containers of "glorified pairs of interval starting/end points". They are designed to implement just that business of combining, searching, in a generically optimized fashion.
So you don't have to.
If you let the library do what it's supposed to do:
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
Using functional domain typedefs invites a higher level approach. Now, let's ask the hypothetical "business question":
What do we actually want to do with our records of per-location downtimes?
Well, we essentially want to
tally them for all discernable time slots and
filter those where tallies are at least 2
finally, we'd like to show the "merged" time slots that remain.
Ok, engineer: implement it!
Hmm. Tallying. How hard could it be?
❕ The key to elegant solutions is the choice of the right datastructure
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
Now it's just bulk insertion:
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
Ok, let's filter. We just need the predicate that works on our DownMap, right
// define threshold where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
Merge the time slots!
Actually. We just create another DownTimes set, right. Just, not per location this time.
The choice of data structure wins the day again:
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
Report!
std::cout << "Criticals: " << merged << "\n";
Note that nowhere did we come close to manipulating array indices, overlapping or non-overlapping intervals, closed or open boundaries. Or, [eeeeek!] brute force permutations of collection elements.
We just stated our goals, and let the library do the work.
Full Demo
Live On Coliru
#include <boost/icl/interval_set.hpp>
#include <boost/icl/interval_map.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <boost/range/irange.hpp>
#include <algorithm>
#include <iostream>
#include <vector>
using TimePoint = unsigned;
using DownTimes = boost::icl::interval_set<TimePoint>;
using Interval = DownTimes::interval_type;
using Records = std::vector<DownTimes>;
using Tally = unsigned; // or: bit mask representing affected locations?
using DownMap = boost::icl::interval_map<TimePoint, Tally>;
// Just for fun, removed the explicit loops from the generation too. Obviously,
// this is bit gratuitous :)
static DownTimes generate_downtime(int j) {
return boost::accumulate(
boost::irange(0, 5),
DownTimes{},
[j](DownTimes accum, int i) { return accum + Interval::closed((i*10), ((i*10) + 5 - j)); }
);
}
int main() {
// Initializing data for test
using namespace boost::adaptors;
auto const records = boost::copy_range<Records>(boost::irange(0,4) | transformed(generate_downtime));
for (auto location : records | indexed()) {
std::cout << "Location " << (location.index()+1) << " " << location.value() << std::endl;
}
// We will do a tally of affected locations per time slot
DownMap tallied;
for (auto& location : records)
for (auto& incident : location)
tallied.add({incident, 1u});
// We will combine them so we get an interval_set defined for those periods
// where at least 2 locations have an outage
auto exceeds_threshold = [](DownMap::value_type const& slot) {
return slot.second >= 2;
};
// just printing the union of any criticals:
DownTimes merged;
for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
merged.insert(slot);
std::cout << "Criticals: " << merged << "\n";
}
Which prints
Location 1 {[0,5][10,15][20,25][30,35][40,45]}
Location 2 {[0,4][10,14][20,24][30,34][40,44]}
Location 3 {[0,3][10,13][20,23][30,33][40,43]}
Location 4 {[0,2][10,12][20,22][30,32][40,42]}
Criticals: {[0,4][10,14][20,24][30,34][40,44]}
At the end of the permutation loop, you write:
numCombinations++;
std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl; // The problem appears here
My debugger tells me that on the first iteration numCombinations was 0 before the increment. But incrementing it made it out of range for the combinations container (since that is only a single element, so having index 0).
Did you mean to increment it after the use? Was there any particular reason not to use
std::cout << "[-INTERSEC-] " << combinations.back() << "\n";
or, for c++03
std::cout << "[-INTERSEC-] " << combinations[combinations.size()-1] << "\n";
or even just:
std::cout << "[-INTERSEC-] " << combinations.at(numCombinations) << "\n";
which would have thrown std::out_of_range?
On a side note, I think Boost ICL has vastly more efficient ways to get the answer you're after. Let me think about this for a moment. Will post another answer if I see it.
UPDATE: Posted the other answer show casing highlevel coding with Boost ICL
I'd like to simulate a std::vector that has mixed const and non-const elements. More specifically, I want to have functions that operate on a vector and are allowed to see the entire vector but may only write to specific elements. The elements that can and cannot be written will be determined at runtime and may change during runtime.
One solution is to create a container that holds an array of elements and an equal sized array of booleans. All non-const access would be through a function that checks against the boolean array if the write is valid and throws an exception otherwise. This has the downside of adding a conditional to every write.
A second solution might be to have the same container but this time write access is done by passing an array editing function to a member function of the container. The container member function would let the array editing function go at the array and then check that it didn't write to the non-writable elements. This has the downside that the array editing function could be sneaky and pass around non-const pointers to the array elements, let the container function check that all is well, and then write to non-writable elements.
The last issue seems difficult to solve. It seems like offering direct writable access ever means we have to assume direct writable access always.
Are there better solutions?
EDIT: Ben's comment has a good point I should have addressed in the question: why not a vector of const and a vector of non-const?
The issue is that the scenario I have in mind is that we have elements that are conceptually part of one single array. Their placement in that array is meaningful. To use vectors of const and non-const requires mapping the single array that exist in concept to the two vectors that would implement it. Also, if the list of writable elements changes then the elements or pointers in the two vectors would need to be moved about.
I think you can accomplish what you wish with the following class, which is very simplified to illustrate the main concept.
template <typename T>
struct Container
{
void push_back(bool isconst, T const& item)
{
data.push_back(std::make_pair(isconst, item));
}
T& at(size_t index)
{
// Check whether the object at the index is const.
if ( data[index].first )
{
throw std::runtime_error("Trying to access a const-member");
}
return data[index].second;
}
T const& at(size_t index) const
{
return data[index].second;
}
T const& at(size_t index, int dummy) // Without dummy, can't differentiate
// between the two functions.
{
return data[index].second;
}
T const& at(size_t index, int dummy) const // Without dummy, can't differentiate
// between the two functions.
{
return data[index].second;
}
std::vector<std::pair<bool, T> > data;
};
Here's a test program and its output.
#include <stdio.h>
#include <iostream>
#include <utility>
#include <stdexcept>
#include <vector>
//--------------------------------
// Put the class definition here.
//--------------------------------
int main()
{
Container<int> c;
c.push_back(true, 10);
c.push_back(false, 20);
try
{
int value = c.at(0); // Show throw exception.
}
catch (...)
{
std::cout << "Expected to see this.\n";
}
int value = c.at(0, 1); // Should work.
std::cout << "Got c[0]: " << value << "\n";
value = c.at(1); // Should work.
std::cout << "Got c[1]: " << value << "\n";
value = c.at(1, 1); // Should work.
std::cout << "Got c[1]: " << value << "\n";
// Accessing the data through a const object.
// All functions should work since they are returning
// const&.
Container<int> const& cref = c;
value = cref.at(0); // Should work.
std::cout << "Got c[0]: " << value << "\n";
value = cref.at(0, 1); // Should work.
std::cout << "Got c[0]: " << value << "\n";
value = cref.at(1); // Should work.
std::cout << "Got c[1]: " << value << "\n";
value = cref.at(1, 1); // Should work.
std::cout << "Got c[1]: " << value << "\n";
// Changing values ... should only work for '1'
try
{
c.at(0) = 100; // Show throw exception.
}
catch (...)
{
std::cout << "Expected to see this.\n";
}
c.at(1) = 200; // Should work.
std::cout << "Got c[1]: " << c.at(1) << "\n";
}
Output from running the program:
Expected to see this.
Got c[0]: 10
Got c[1]: 20
Got c[1]: 20
Got c[0]: 10
Got c[0]: 10
Got c[1]: 20
Got c[1]: 20
Expected to see this.
Got c[1]: 200
I want access a STL based container read-only from parallel running threads. Without using any user implemented locking. The base of the following code is C++11 with a proper implementation of the standard.
http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
http://www.sgi.com/tech/stl/thread_safety.html
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html
http://www.open-std.org/jtc1/sc22/wg21/ (current draft or N3337, which is essentially C++11 with minor errors and typos corrected)
23.2.2 Container data races [container.requirements.dataraces]
For purposes of avoiding data races (17.6.5.9), implementations shall
consider the following functions to be const: begin, end, rbegin,
rend, front, back, data, find, lower_bound, upper_bound, equal_range,
at and, except in associative or unordered associative containers,
operator[].
Notwithstanding (17.6.5.9), implementations are required
to avoid data races when the contents of the con- tained object in
different elements in the same sequence, excepting vector<bool>, are
modified concurrently.
[ Note: For a vector<int> x with a size greater
than one, x[1] = 5 and *x.begin() = 10 can be executed concurrently
without a data race, but x[0] = 5 and *x.begin() = 10 executed
concurrently may result in a data race. As an exception to the general
rule, for a vector<bool> y, y[0] = true may race with y[1]
= true. — end note ]
and
17.6.5.9 Data race avoidance [res.on.data.races] 1 This section specifies requirements that implementations shall meet to prevent data
races (1.10). Every standard library function shall meet each
requirement unless otherwise specified. Implementations may prevent
data races in cases other than those specified below.
2 A C++ standard
library function shall not directly or indirectly access objects
(1.10) accessible by threads other than the current thread unless
the objects are accessed directly or indirectly via the function’s
argu- ments, including this.
3 A C++ standard library function shall
not directly or indirectly modify objects (1.10) accessible by threads
other than the current thread unless the objects are accessed directly
or indirectly via the function’s non-const arguments, including
this.
4 [ Note: This means, for example, that implementations can’t
use a static object for internal purposes without synchronization
because it could cause a data race even in programs that do not
explicitly share objects between threads. — end note ]
5 A C++ standard library function shall not access objects indirectly
accessible via its arguments or via elements of its container
arguments except by invoking functions required by its specification
on those container elements.
6 Operations on iterators obtained by
calling a standard library container or string member function may
access the underlying container, but shall not modify it. [ Note: In
particular, container operations that invalidate iterators conflict
with operations on iterators associated with that container. — end
note ]
7 Implementations may share their own internal objects between
threads if the objects are not visible to users and are protected
against data races.
8 Unless otherwise specified, C++ standard library
functions shall perform all operations solely within the current
thread if those operations have effects that are visible (1.10) to
users.
9 [ Note: This allows implementations to parallelize operations
if there are no visible side effects. — end note ]
Conclusion
Containers are not thread safe! But it is safe to call const functions on containers from multiple parallel threads. So it is possible to do read-only operations from parallel threads without locking.
Am I right?
I pretend that their doesn't exist any faulty implementation and every implementation of the C++11 standard is correct.
Sample:
// concurrent thread access to a stl container
// g++ -std=gnu++11 -o p_read p_read.cpp -pthread -Wall -pedantic && ./p_read
#include <iostream>
#include <iomanip>
#include <string>
#include <unistd.h>
#include <thread>
#include <mutex>
#include <map>
#include <cstdlib>
#include <ctime>
using namespace std;
// new in C++11
using str_map = map<string, string>;
// thread is new in C++11
// to_string() is new in C++11
mutex m;
const unsigned int MAP_SIZE = 10000;
void fill_map(str_map& store) {
int key_nr;
string mapped_value;
string key;
while (store.size() < MAP_SIZE) {
// 0 - 9999
key_nr = rand() % MAP_SIZE;
// convert number to string
mapped_value = to_string(key_nr);
key = "key_" + mapped_value;
pair<string, string> value(key, mapped_value);
store.insert(value);
}
}
void print_map(const str_map& store) {
str_map::const_iterator it = store.begin();
while (it != store.end()) {
pair<string, string> value = *it;
cout << left << setw(10) << value.first << right << setw(5) << value.second << "\n";
it++;
}
}
void search_map(const str_map& store, int thread_nr) {
m.lock();
cout << "thread(" << thread_nr << ") launched\n";
m.unlock();
// use a straight search or poke around random
bool straight = false;
if ((thread_nr % 2) == 0) {
straight = true;
}
int key_nr;
string mapped_value;
string key;
str_map::const_iterator it;
string first;
string second;
for (unsigned int i = 0; i < MAP_SIZE; i++) {
if (straight) {
key_nr = i;
} else {
// 0 - 9999, rand is not thread-safe, nrand48 is an alternative
m.lock();
key_nr = rand() % MAP_SIZE;
m.unlock();
}
// convert number to string
mapped_value = to_string(key_nr);
key = "key_" + mapped_value;
it = store.find(key);
// check result
if (it != store.end()) {
// pair
first = it->first;
second = it->second;
// m.lock();
// cout << "thread(" << thread_nr << ") " << key << ": "
// << right << setw(10) << first << setw(5) << second << "\n";
// m.unlock();
// check mismatch
if (key != first || mapped_value != second) {
m.lock();
cerr << key << ": " << first << second << "\n"
<< "Mismatch in thread(" << thread_nr << ")!\n";
exit(1);
// never reached
m.unlock();
}
} else {
m.lock();
cerr << "Warning: key(" << key << ") not found in thread("
<< thread_nr << ")\n";
exit(1);
// never reached
m.unlock();
}
}
}
int main() {
clock_t start, end;
start = clock();
str_map store;
srand(0);
fill_map(store);
cout << "fill_map finished\n";
// print_map(store);
// cout << "print_map finished\n";
// copy for check
str_map copy_store = store;
// launch threads
thread t[10];
for (int i = 0; i < 10; i++) {
t[i] = thread(search_map, store, i);
}
// wait for finish
for (int i = 0; i < 10; i++) {
t[i].join();
}
cout << "search_map threads finished\n";
if (store == copy_store) {
cout << "equal\n";
} else {
cout << "not equal\n";
}
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
This code can be compiled with GCC 4.7 and runs fine on my machine.
$ echo $?
$ 0
A data-race, from the C++11 specification in sections 1.10/4 and 1.10/21, requires at least two threads with non-atomic access to the same set of memory locations, the two threads are not synchronized with regards to accessing the set of memory locations, and at least one thread writes to or modifies an element in the set of memory locations. So in your case, if the threads are only reading, you are fine ... by definition since none of the threads write to the same set of memory locations, there are no data-races even though there is no explicit synchronization mechanism between the threads.
Yes, you are right. You are safe as long as the thread that populates your vector finishes doing so before the reader threads start. There was a similar question recently.