I hear that const means thread-safe in C++11. Is that true?
Does that mean const is now the equivalent of Java's synchronized?
Are they running out of keywords?
I hear that const means thread-safe in C++11. Is that true?
It is somewhat true...
This is what the Standard Language has to say on thread-safety:
[1.10/4]
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
[1.10/21]
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
which is nothing else than the sufficient condition for a data race to occur:
There are two or more actions being performed at the same time on a given thing; and
At least one of them is a write.
The Standard Library builds on that, going a bit further:
[17.6.5.9/1]
This section specifies requirements that implementations shall meet to prevent data races (1.10). Every standard library function shall meet each requirement unless otherwise specified. Implementations may prevent data races in cases other than those specified below.
[17.6.5.9/3]
A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.
which in simple words says that it expects operations on const objects to be thread-safe. This means that the Standard Library won't introduce a data race as long as operations on const objects of your own types either
Consist entirely of reads --that is, there are no writes--; or
Internally synchronizes writes.
If this expectation does not hold for one of your types, then using it directly or indirectly together with any component of the Standard Library may result in a data race. In conclusion, const does mean thread-safe from the Standard Library point of view. It is important to note that this is merely a contract and it won't be enforced by the compiler, if you break it you get undefined behavior and you are on your own. Whether const is present or not will not affect code generation --at least not in respect to data races--.
Does that mean const is now the equivalent of Java's synchronized?
No. Not at all...
Consider the following overly simplified class representing a rectangle:
class rect {
int width = 0, height = 0;
public:
/*...*/
void set_size( int new_width, int new_height ) {
width = new_width;
height = new_height;
}
int area() const {
return width * height;
}
};
The member-function area is thread-safe; not because its const, but because it consist entirely of read operations. There are no writes involved, and at least one write involved is necessary for a data race to occur. That means that you can call area from as many threads as you want and you will get correct results all the time.
Note that this doesn't mean that rect is thread-safe. In fact, its easy to see how if a call to area were to happen at the same time that a call to set_size on a given rect, then area could end up computing its result based on an old width and a new height (or even on garbled values).
But that is alright, rect isn't const so its not even expected to be thread-safe after all. An object declared const rect, on the other hand, would be thread-safe since no writes are possible (and if you are considering const_cast-ing something originally declared const then you get undefined-behavior and that's it).
So what does it mean then?
Let's assume --for the sake of argument-- that multiplication operations are extremely costly and we better avoid them when possible. We could compute the area only if it is requested, and then cache it in case it is requested again in the future:
class rect {
int width = 0, height = 0;
mutable int cached_area = 0;
mutable bool cached_area_valid = true;
public:
/*...*/
void set_size( int new_width, int new_height ) {
cached_area_valid = ( width == new_width && height == new_height );
width = new_width;
height = new_height;
}
int area() const {
if( !cached_area_valid ) {
cached_area = width;
cached_area *= height;
cached_area_valid = true;
}
return cached_area;
}
};
[If this example seems too artificial, you could mentally replace int by a very large dynamically allocated integer which is inherently non thread-safe and for which multiplications are extremely costly.]
The member-function area is no longer thread-safe, it is doing writes now and is not internally synchronized. Is it a problem? The call to area may happen as part of a copy-constructor of another object, such constructor could have been called by some operation on a standard container, and at that point the standard library expects this operation to behave as a read in regard to data races. But we are doing writes!
As soon as we put a rect in a standard container --directly or indirectly-- we are entering a contract with the Standard Library. To keep doing writes in a const function while still honoring that contract, we need to internally synchronize those writes:
class rect {
int width = 0, height = 0;
mutable std::mutex cache_mutex;
mutable int cached_area = 0;
mutable bool cached_area_valid = true;
public:
/*...*/
void set_size( int new_width, int new_height ) {
if( new_width != width || new_height != height )
{
std::lock_guard< std::mutex > guard( cache_mutex );
cached_area_valid = false;
}
width = new_width;
height = new_height;
}
int area() const {
std::lock_guard< std::mutex > guard( cache_mutex );
if( !cached_area_valid ) {
cached_area = width;
cached_area *= height;
cached_area_valid = true;
}
return cached_area;
}
};
Note that we made the area function thread-safe, but the rect still isn't thread-safe. A call to area happening at the same time that a call to set_size may still end up computing the wrong value, since the assignments to width and height are not protected by the mutex.
If we really wanted a thread-safe rect, we would use a synchronization primitive to protect the non-thread-safe rect.
Are they running out of keywords?
Yes, they are. They have been running out of keywords since day one.
Source: You don't know const and mutable - Herb Sutter
This is an addition to K-ballo's answer.
The term thread-safe is abused in this context. The correct wording is: a const function implies thread-safe bitwise const or internally synchronised, as stated by Herb Sutter (29:43) himself
It should be thread-safe to call a const function from multiple threads simultaneously, without calling a non-const function at the same time in another thread.
So, a const function should not (and will not most of the time) be really thread-safe, as it may read memory (without internal synchronisation) that could be changed by another non-const function. In general, this is not thread-safe as a data race occurs even if only one thread is writing (and another reading the data).
See also my answer to the related question What is the definition of a thread safe function according to the C++11 (Language/Library) Standard?.
No! Counterexample:
#include <memory>
#include <thread>
class C
{
std::shared_ptr<int> refs = std::make_shared<int>();
public:
C() = default;
C(C const &other) : refs(other.refs)
{ ++*this->refs; }
};
int main()
{
C const c;
std::thread t1([&]() { C const dummy(c); });
std::thread t2([&]() { C const dummy(c); });
}
The copy-constructor of C is perfectly legitimate, but it is not thread-safe despite C being const.
Related
My application consists of calling dozens of functions millions of times. In each of those functions, one or a few temporary std::vector containers of POD (plain old data) types are initialized, used, and then destructed. By profiling my code, I find the allocations and deallocations lead to a huge overhead.
A lazy solution is to rewrite all the functions as functors containing those temporary buffer containers as class members. However this would blow up the memory consumption as the functions are many and the buffer sizes are not trivial.
A better way is to analyze the code, gather all the buffers, premeditate how to maximally reuse them, and feed a minimal set of shared buffer containers to the functions as arguments. But this can be too much work.
I want to solve this problem once for all my future development during which temporary POD buffers become necessary, without having to have much premeditation. My idea is to implement a container port, and take the reference to it as an argument for every function that may need temporary buffers. Inside those functions, one should be able to fetch containers of any POD type from the port, and the port should also auto-recall the containers before the functions return.
// Port of vectors of POD types.
struct PODvectorPort
{
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::vector<std::size_t> > X; // Container pool.
PODvectorPort() { Nlent = 0; }
};
// Functor that manages the port.
struct PODvectorPortOffice
{
std::size_t initialNlent; // Number of already-dispatched containers
// when the office is set up.
PODvectorPort *p; // Pointer to the port.
PODvectorPortOffice(PODvectorPort &port)
{
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
{
// return *((std::vector<X>*)(&y)); // UB although works
std::vector<X> *rst = nullptr;
std::memcpy(&rst, &y, std::min(
sizeof(std::vector<X>*), sizeof(std::vector<Y>*)));
return *rst; // guess it makes no difference. Should still be UB.
}
template<typename T>
std::vector<T> & lend()
{
++p->Nlent;
// Ensure sufficient container pool size:
while (p->X.size() < p->Nlent) p->X.push_back( std::vector<size_t>(0) );
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
void recall() { p->Nlent = initialNlent; }
~PODvectorPortOffice() { recall(); }
};
struct ArbitraryPODstruct
{
char a[11]; short b[7]; int c[5]; float d[3]; double e[2];
};
// Example f1():
// f2(), f3(), ..., f50() are similarly defined.
// All functions are called a few million times in certain
// order in main().
// port is defined in main().
void f1(other arguments..., PODvectorPort &port)
{
PODvectorPort portOffice(port);
// Oh, I need a buffer of chars:
std::vector<char> &tmpchar = portOffice.lend();
tmpchar.resize(789); // Trivial if container already has sufficient capacity.
// ... do things
// Oh, I need a buffer of shorts:
std::vector<short> &tmpshort = portOffice.lend();
tmpshort.resize(456); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of ArbitraryPODstruct:
std::vector<ArbitraryPODstruct> &tmpArb = portOffice.lend();
tmpArb.resize(123); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of integers, but also tmpArb is no longer
// needed. Why waste it? Cache hot.
std::vector<int> &tmpint = portOffice.repaint(tmpArb);
tmpint.resize(300); // Trivial.
// ... do things.
}
Although the code is compliable by both gcc-8.3 and MSVS 2019 with -O2 to -Ofast, and passes extensive tests for all options, I expect criticism due to the hacky nature of PODvectorPortOffice::repaint(), which "casts" the vector type in-place.
A set of sufficient but not necessary conditions for the correctness and efficiency of the above code are:
std::vector<T> stores 3 pointers to the underlying buffer's &[0], &[0] + .size(), &[0] + .capacity().
std::vector<T>'s allocator calls malloc().
malloc() returns an 8-byte (or sizeof(std::size_t)) aligned address.
So, if this is unacceptable to you, what would be the modern, proper way of addressing my need? Is there a way of writing a manager that achieve what my code does only without violating the Standard?
Thanks!
Edits: A little more context of my problem. Those functions mainly compute some simple statistics of the inputs. The inputs are data streams of financial parameters of different types and sizes. To compute the statistics, those data need to be altered and re-arranged first, thus the buffers for temporary copies. Computing the statistics is cheap, thus the allocations and deallocations can become expensive, relatively. Why do I want a manger for arbitrary POD type? Because 2 weeks from now I may start receiving a data stream of a different type, which can be a bunch of primitive types zipped in a struct, or a struct of the composite types encountered so far. I, of course, would like the upper stream to just send separate flows of primitive types, but I have no control of that aspect.
More edits: after tons of reading and code experimenting regarding the strict aliasing rule, the answer should be, don't try everything I put up there --- it works, for now, but don't do it. Instead, I'll be diligent and stick to my previous code-as-you-go style, just add a vector<vector<myNewType> > into the port once a new type comes up, and manage it in a similar way. The accepted answer also offers a nice alternative.
Even more edits: conceived a stronger class that has better chance to thwart potential optimizations under the strict aliasing rule. DO NOT USE IT WITHOUT TESTING AND THOROUGH UNDERSTANDING OF THE STRICT ALIASING RULE.
// -std=c++17
#include <cstring>
#include <cstddef>
#include <iostream>
#include <vector>
#include <chrono>
// POD: plain old data.
// Idea: design a class that can let you maximally reuse temporary
// containers during a program.
// Port of vectors of POD types.
template <std::size_t portsize = 42>
class PODvectorPort
{
static constexpr std::size_t Xsize = portsize;
std::size_t signature;
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::size_t> X[portsize]; // Container pool.
PODvectorPort(const PODvectorPort &);
PODvectorPort & operator=( const PODvectorPort& );
public:
std::size_t Ndispatched() { return Nlent; }
std::size_t showSignature() { return signature; }
PODvectorPort() // Permuted random number generator.
{
std::size_t state = std::chrono::high_resolution_clock::now().time_since_epoch().count();
state ^= (uint64_t)(&std::memmove);
signature = ((state >> 18) ^ state) >> 27;
std::size_t rot = state >> 59;
signature = (signature >> rot) | (state << ((-rot) & 31));
Nlent = 0;
}
template<typename podvecport>
friend class PODvectorPortOffice;
};
// Functor that manages the port.
template<typename podvecport>
class PODvectorPortOffice
{
// Number of already-dispatched containers when the office is set up.
std::size_t initialNlent;
podvecport *p; // Pointer to the port.
PODvectorPortOffice( const PODvectorPortOffice& ); // non construction-copyable
PODvectorPortOffice& operator=( const PODvectorPortOffice& ); // non copyable
constexpr void check()
{
while (__cplusplus < 201703)
{
std::cerr << "PODvectorPortOffice: C++ < 17, Stall." << std::endl;
}
// Check if allocation will be 8-byte (or more) aligned.
// Intend it not to work on machine < 64-bit.
constexpr std::size_t aln = alignof(std::max_align_t);
while (aln < 8)
{
std::cerr << "PODvectorPortOffice: Allocation is not at least 8-byte aligned, Stall." <<
std::endl;
}
while ((aln & (aln - 1)) != 0)
{
std::cerr << "PODvectorPortOffice: Alignment is not a power of 2 bytes. Stall." << std::endl;
}
// Random checks to see if sizeof(vector<S>) != sizeof(vector<T>).
if(true)
{
std::size_t vecHeadSize[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
vecHeadSize[0] = sizeof(std::vector<char>(0));
vecHeadSize[1] = sizeof(std::vector<short>(1));
vecHeadSize[2] = sizeof(std::vector<int>(2));
vecHeadSize[3] = sizeof(std::vector<long>(3));
vecHeadSize[4] = sizeof(std::vector<std::size_t>(5));
vecHeadSize[5] = sizeof(std::vector<float>(7));
vecHeadSize[6] = sizeof(std::vector<double>(11));
vecHeadSize[7] = sizeof(std::vector<std::vector<char> >(13));
vecHeadSize[8] = sizeof(std::vector<std::vector<int> >(17));
vecHeadSize[9] = sizeof(std::vector<std::vector<double> >(19));
struct tmpclass1 { char a; short b; };
struct tmpclass2 { char a; float b; };
struct tmpclass3 { char a; double b; };
struct tmpclass4 { int a; char b; };
struct tmpclass5 { double a; char b; };
struct tmpclass6 { double a[5]; char b[3]; short c[3]; };
vecHeadSize[10] = sizeof(std::vector<tmpclass1>(23));
vecHeadSize[11] = sizeof(std::vector<tmpclass2>(29));
vecHeadSize[12] = sizeof(std::vector<tmpclass3>(31));
vecHeadSize[13] = sizeof(std::vector<tmpclass4>(37));
vecHeadSize[14] = sizeof(std::vector<tmpclass4>(41));
vecHeadSize[15] = sizeof(std::vector<tmpclass4>(43));
std::size_t notSame = 0;
for(int i = 0; i < 16; ++i)
notSame += vecHeadSize[i] != sizeof(std::size_t) * 3;
while (notSame)
{
std::cerr << "sizeof(std::vector<S>) != sizeof(std::vector<T>), \
PODvectorPortOffice cannot handle. Stall." << std::endl;
}
}
}
void recall() { p->Nlent = initialNlent; }
public:
PODvectorPortOffice(podvecport &port)
{
check();
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
// AFTER A VECTOR IS REPAINTED, DO NOT USE THE OLD VECTOR AGAIN !!
{
while (std::is_same<bool, X>::value)
{
std::cerr << "PODvectorPortOffice: Cannot repaint the vector to \
std::vector<bool>. Stall." << std::endl;
}
std::vector<X> *x;
std::vector<Y> *yp = &y;
std::memcpy(&x, &yp, sizeof(x));
return *x; // Not compliant with strict aliasing rule.
}
template<typename T>
std::vector<T> & lend()
{
while (p->Nlent >= p->Xsize)
{
std::cerr << "PODvectorPortOffice: No more containers. Stall." << std::endl;
}
++p->Nlent;
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
~PODvectorPortOffice()
{
// Because p->signature can only be known at runtime, an aggressive,
// compliant compiler (ACC) will never remove this
// branch. Volatile might do, but trustworthiness?
if(p->signature == 0)
{
constexpr std::size_t sizeofvec = sizeof(std::vector<std::size_t>);
char dummy[sizeofvec * p->Xsize];
std::memcpy(dummy, p->X, p->Nlent * sizeofvec);
std::size_t ticketNum = 0;
char *xp = (char*)(p->X);
for(int i = 0, iend = p->Nlent * sizeofvec; i < iend; ++i)
{
xp[i] &= xp[iend - i - 1] * 5;
ticketNum += xp[i] ^ ticketNum;
}
std::cerr << "Congratulations! After the port office was decommissioned, \
you found a winning lottery ticket. The odds is less than 2.33e-10. Your \
ticket number is " << ticketNum << std::endl;
std::memcpy(p->X, dummy, p->Nlent * sizeofvec);
// According to the strict aliasing rule, a char* can point to any memory
// block pointed by another pointer of any type T*. Thus given an ACC,
// the writes to that block via the char* must be fully acknowledged in
// time by T*, namely, for reading contents from T*, a reload instruction
// will be kept in the assembly code to achieve a sort of
// "register-cache-memory coherence" (RCMC).
// We also do not care about the renters' (who received the reference via
// .lend()) RCMC, because PODvectorPortOffice never accesses the contents
// of those containers.
}
recall();
}
};
Any adversarial test case to break it, especially on GCC>=8.3 or MSVS >= 2019, is welcomed!
Let me frame this by saying I don't think there's an "authoritative" answer to this question. That said, you've provided enough constraints that a suggested path is at least worthwhile. Let's review the requirements:
Solution must use std::vector. This is in my opinion the most unfortunate requirement for reasons I won't get into here.
Solution must be standards compliant and not resort to rule violations, like the strict aliasing rule.
Solution must either reduce the number of allocations performed, or reduce the overhead of allocations to the point of being negligible.
In my opinion this is definitely a job for a custom allocator. There are a couple of off-the-shelf options that come close to doing what you want, for example the Boost Pool Allocators. The one you're most interested in is boost::pool_allocator. This allocator will create a singleton "pool" for each distinct object size (note: not object type), which grows as needed, but never shrinks until you explicitly purge it.
The main difference between this and your solution is that you'll have distinct pools of memory for objects of different sizes, which means it will use more memory than your posted solution, but in my opinion this is a reasonable trade-off. To be maximally efficient, you could simply start a batch of operations by creating vectors of each needed type with an appropriate size. All subsequent vector operations which use these allocators will do trivial O(1) allocations and deallocations. Roughly in pseudo-code:
// be careful with this, probably want [[nodiscard]], this is code
// is just rough guidance:
void force_pool_sizes(void)
{
std::vector<int, boost::pool_allocator<int>> size_int_vect;
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> size_16_vect;
...
size_int_vect.resize(100); // probably makes malloc calls
size_16_vect.resize(200); // probably makes malloc calls
...
// on return, objects go out of scope, but singleton pools
// with allocated blocks of memory remain for future use
// until explicitly purged.
}
void expensive_long_running(void)
{
force_pool_sizes();
std::vector<int, boost::pool_allocator<int>> data1;
... do stuff, malloc/free will never be called...
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> data2;
... do stuff, malloc/free will never be called...
// free everything:
boost::singleton_pool<boost::pool_allocator_tag, sizeof(int)>::release_memory();
}
If you want to take this a step further on being memory efficient, if you know for a fact that certain pool sizes are mutually exclusive, you could modify the boost pool_allocator to use a slightly different singleton backing store which allows you to move a memory block from one block size to another. This is probably out of scope for now, but the boost code itself is straightforward enough, if memory efficiency is critical, it's probably worthwhile.
It's worth pointing out that there's probably some confusion about the strict aliasing rule, especially when it comes to implementing your own memory allocators. There are lots and lots of SO questions about strict aliasing and what it does and doesn't mean. This one is a good place to start.
The key takeaway is that it's perfectly ordinary and acceptable in low level C++ code to take an array of memory and cast it to some object type. If this were not the case, std::allocator wouldn't exist. You also wouldn't have much use for things like std::aligned_storage. Look at the example use case for std::aligned_storage on cppreference. An STL-like static_vector class is created which keeps an array of aligned_storage objects that get recast to a concrete type. Nothing about this is "unacceptable" or "illegal", but it does require some additional knowledge and care in handling.
The reason your solution is especially going to enrage the code lawyers is that you're taking pointers of one non-char object type and casting them to different non-char object types. This is a particularly offensive violation of the strict aliasing rule, but also not really necessary given some of your other options.
Also keep in mind that it's not an error to alias memory, it's a warning. I'm not saying go crazy with aliasing, but I am saying that as with all things C and C++, there are justifiable cases to break rules, when you have very thorough knowledge and understanding of both your compiler and the machine you're running on. Just be prepared for some very long and painful debug sessions if it turns out you didn't in fact know those two things as well as you thought you did.
I hear that const means thread-safe in C++11. Is that true?
Does that mean const is now the equivalent of Java's synchronized?
Are they running out of keywords?
I hear that const means thread-safe in C++11. Is that true?
It is somewhat true...
This is what the Standard Language has to say on thread-safety:
[1.10/4]
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
[1.10/21]
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
which is nothing else than the sufficient condition for a data race to occur:
There are two or more actions being performed at the same time on a given thing; and
At least one of them is a write.
The Standard Library builds on that, going a bit further:
[17.6.5.9/1]
This section specifies requirements that implementations shall meet to prevent data races (1.10). Every standard library function shall meet each requirement unless otherwise specified. Implementations may prevent data races in cases other than those specified below.
[17.6.5.9/3]
A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.
which in simple words says that it expects operations on const objects to be thread-safe. This means that the Standard Library won't introduce a data race as long as operations on const objects of your own types either
Consist entirely of reads --that is, there are no writes--; or
Internally synchronizes writes.
If this expectation does not hold for one of your types, then using it directly or indirectly together with any component of the Standard Library may result in a data race. In conclusion, const does mean thread-safe from the Standard Library point of view. It is important to note that this is merely a contract and it won't be enforced by the compiler, if you break it you get undefined behavior and you are on your own. Whether const is present or not will not affect code generation --at least not in respect to data races--.
Does that mean const is now the equivalent of Java's synchronized?
No. Not at all...
Consider the following overly simplified class representing a rectangle:
class rect {
int width = 0, height = 0;
public:
/*...*/
void set_size( int new_width, int new_height ) {
width = new_width;
height = new_height;
}
int area() const {
return width * height;
}
};
The member-function area is thread-safe; not because its const, but because it consist entirely of read operations. There are no writes involved, and at least one write involved is necessary for a data race to occur. That means that you can call area from as many threads as you want and you will get correct results all the time.
Note that this doesn't mean that rect is thread-safe. In fact, its easy to see how if a call to area were to happen at the same time that a call to set_size on a given rect, then area could end up computing its result based on an old width and a new height (or even on garbled values).
But that is alright, rect isn't const so its not even expected to be thread-safe after all. An object declared const rect, on the other hand, would be thread-safe since no writes are possible (and if you are considering const_cast-ing something originally declared const then you get undefined-behavior and that's it).
So what does it mean then?
Let's assume --for the sake of argument-- that multiplication operations are extremely costly and we better avoid them when possible. We could compute the area only if it is requested, and then cache it in case it is requested again in the future:
class rect {
int width = 0, height = 0;
mutable int cached_area = 0;
mutable bool cached_area_valid = true;
public:
/*...*/
void set_size( int new_width, int new_height ) {
cached_area_valid = ( width == new_width && height == new_height );
width = new_width;
height = new_height;
}
int area() const {
if( !cached_area_valid ) {
cached_area = width;
cached_area *= height;
cached_area_valid = true;
}
return cached_area;
}
};
[If this example seems too artificial, you could mentally replace int by a very large dynamically allocated integer which is inherently non thread-safe and for which multiplications are extremely costly.]
The member-function area is no longer thread-safe, it is doing writes now and is not internally synchronized. Is it a problem? The call to area may happen as part of a copy-constructor of another object, such constructor could have been called by some operation on a standard container, and at that point the standard library expects this operation to behave as a read in regard to data races. But we are doing writes!
As soon as we put a rect in a standard container --directly or indirectly-- we are entering a contract with the Standard Library. To keep doing writes in a const function while still honoring that contract, we need to internally synchronize those writes:
class rect {
int width = 0, height = 0;
mutable std::mutex cache_mutex;
mutable int cached_area = 0;
mutable bool cached_area_valid = true;
public:
/*...*/
void set_size( int new_width, int new_height ) {
if( new_width != width || new_height != height )
{
std::lock_guard< std::mutex > guard( cache_mutex );
cached_area_valid = false;
}
width = new_width;
height = new_height;
}
int area() const {
std::lock_guard< std::mutex > guard( cache_mutex );
if( !cached_area_valid ) {
cached_area = width;
cached_area *= height;
cached_area_valid = true;
}
return cached_area;
}
};
Note that we made the area function thread-safe, but the rect still isn't thread-safe. A call to area happening at the same time that a call to set_size may still end up computing the wrong value, since the assignments to width and height are not protected by the mutex.
If we really wanted a thread-safe rect, we would use a synchronization primitive to protect the non-thread-safe rect.
Are they running out of keywords?
Yes, they are. They have been running out of keywords since day one.
Source: You don't know const and mutable - Herb Sutter
This is an addition to K-ballo's answer.
The term thread-safe is abused in this context. The correct wording is: a const function implies thread-safe bitwise const or internally synchronised, as stated by Herb Sutter (29:43) himself
It should be thread-safe to call a const function from multiple threads simultaneously, without calling a non-const function at the same time in another thread.
So, a const function should not (and will not most of the time) be really thread-safe, as it may read memory (without internal synchronisation) that could be changed by another non-const function. In general, this is not thread-safe as a data race occurs even if only one thread is writing (and another reading the data).
See also my answer to the related question What is the definition of a thread safe function according to the C++11 (Language/Library) Standard?.
No! Counterexample:
#include <memory>
#include <thread>
class C
{
std::shared_ptr<int> refs = std::make_shared<int>();
public:
C() = default;
C(C const &other) : refs(other.refs)
{ ++*this->refs; }
};
int main()
{
C const c;
std::thread t1([&]() { C const dummy(c); });
std::thread t2([&]() { C const dummy(c); });
}
The copy-constructor of C is perfectly legitimate, but it is not thread-safe despite C being const.
Given the following example code:
int var;
int mvar;
std::mutex mvar_mutex;
void f(){
mvar_mutex.lock();
mvar = var * var;
mvar_mutex.unlock();
}
I want to express that mvar_mutex is bound to the variable mvar and protects only that variable. mvar_mutex should not protect var because it is not bound to it. Hence the compiler would be allowed to transform the above code into the below code:
int var;
int mvar;
std::mutex mvar_mutex;
void f(){
int r = var * var; //possible data race created if binding is not known
mvar_mutex.lock();
mvar = r;
mvar_mutex.unlock();
}
This might reduce contention on the lock as less work is being done while holding it.
For int this can be done using std::atomic<int> mvar; and removing mvar_mutex, but for other types such as std::vector<int> this is not possible.
How do I express the mutex-variable binding in a way that C++ compilers understand it and do the optimization? It should be allowed to reorder any variable up or down across mutex boundaries for any variable that is not bound to that mutex
Since the code is being generated using clang::ASTConsumer and clang::RecursiveASTVisitor I am willing to use non-standard extensions and AST manipulations as long as clang (ideally clang 4.0) supports them and the resulting code does not need to be elegant or human-readable.
Edit since this seems to be causing confusion: The above transformation is not legal in C++. The described binding of mutex to variable doesn't exist. The question is about how to implement that or achieve the same effect.
If you wish to achieve that the std::mutex will only be held until an operation is performed on the protected object, you can write a wrapper class as follows:
#include <cstdio>
#include <mutex>
template<typename T>
class LockAssignable {
public:
LockAssignable& operator=(const T& t) {
std::lock_guard<std::mutex> lk(m_mutex);
m_protected = t;
return *this;
}
operator T() const {
std::lock_guard<std::mutex> lk(m_mutex);
return m_protected;
}
/* other stuff */
private:
mutable std::mutex m_mutex;
T m_protected {};
};
inline int factorial(int n) {
return (n > 1 ? n * factorial(n - 1) : 1);
}
int main() {
int var = 5;
LockAssignable<int> mvar;
mvar = factorial(var);
printf("Result: %d\n", static_cast<int>(mvar));
return 0;
}
In the example above the factorial will be calculated in advance and the m_mutex will be acquired only when the assignment or the implicit conversion operator being called on mvar.
Assembly Output
For the primitive data types you can use std::atomic with std::memory_order_relaxed.
The documentation states that:
there are no synchronization or ordering constraints imposed on other
reads or writes, only this operation's atomicity is guaranteed
In the following example, the atomicity of the assignation is guaranteed, but the compiler should be able to move the operations.
std::atomic<int> z = {0};
int a = 3;
z.store(a*a, std::memory_order_relaxed);
For objects, I thought of several solutions, but:
There is no standard way to remove ordering requirements from std::mutex.
It is not possible to create a std::atomic<std::vector>.
It is not possible to create a spinlock using std::memory_order_relaxed (see the example).
I have found some answers that state that:
If the function is not visible in the compilation unit, the compiler generates a barrier because it does not know which variables it uses.
If the function is visible and there is a mutex, the compiler generates a barrier.
For example, see this and this
So, in order to express that mvar_mutex is bound to the variable, you can use some classes as stated by the other answers but I do not think it is possible to fully allow the reordering of the code.
I want to express that mvar_mutex is bound to the variable mvar and protects only that variable.
You can't do this. A mutex actually guards the critical region of machine instructons between the acquisition and release. Only by convention is that associated with a particular instance of shared data.
To avoid doing unnecessary steps inside the critical region, keep the critical regions as simple as possible. In a critical region, only with local variables which the compiler can "see" are obviously not shared with other threads, and with one set of shared data belonging to that mutex. Try not to access other data in the critical region that might be suspected of being shared.
If you could have your proposed language feature, it would only introduce the possibility of error into a program. All it does is take code which is now correct, and make some of it incorrect (in exchange for the promise of some speed: that some code stays correct and is faster, because extraneous computations are moved out of the critical region).
It's like taking a language which already has a nice order of evaluation, in which a[i] = i++ is well defined, and screwing it up with unspecified evaluation order.
How about a locked var template ?
template<typename Type, typename Mutex = std::mutex>
class Lockable
{
public:
Lockable(_Type t) : var_(std::move(t));
Lockable(_Type&&) = default;
// ... could need a bit more
T operator = (const T& x)
{
std::lock_guard<Lockable> lock(*this);
var_ = x;
return x;
}
T operator *() const
{
std::lock_guard<Lockable> lock(*this);
return var_;
}
void lock() const { const_cast<Lockable*>(this)->mutex_.lock(); }
void unlock() const { const_cast<Lockable*>(this)->mutex_.unlock().; }
private:
Mutex mutex_;
Type var_;
};
locked by assignment operator
Lockable<int>var;
var = mylongComputation();
Works great with lock_guard
Lockable<int>var;
std::lock_guard<Lockable<int>> lock(var);
var = 3;
Practical on containers
Lockable<std::vector<int>> vec;
etc...
You can use folly::Synchronized to make sure that the variable is only accessed under a lock:
int var;
folly::Synchronized<int> vmar;
void f() {
*mvar.wlock() = var * var;
}
I want to express that mvar_mutex is bound to the variable mvar and
protects only that variable.
This is not how a mutex works. It doesn't "bind" to anything in order to protect it. You are still free to access this object directly, in complete disregard with any sort of thread safety whatsoever.
What you should do is hide away the "protected variable" so that it is not directly accessible at all, and write an interface that manipulates it that goes through the mutex. This way you ensure that access to the underlying data is protected by that mutex. It can be a single object, it can be a functional group of objects, it can be a collection of many objects, mutexes and atomics, designed to minimize blocking.
Given the following:
class ReadWrite {
public:
int Read(size_t address);
void Write(size_t address, int val);
private:
std::map<size_t, int> db;
}
In read function when accessing an address which no previous write was made to I want to either throw exception designating such error or allow that and return 0, in other words I would like to either use std::map<size_t, int>::operator[]() or std::map<size_t, int>::at(), depending on some bool value which user can set. So I add the following:
class ReadWrite {
public:
int Read(size_t add) { if (allow) return db[add]; return db.at(add);}
void Write(size_t add, int val) { db[add] = val; }
void Allow() { allow = true; }
private:
bool allow = false;
std::map<size_t, int> db;
}
The problem with that is:
Usually, the program will have one call of allow or none at the beginning of the program and then afterwards many accesses. So, performance wise, this code is bad because it every-time performs the check if (allow) where usually it's either always true or always false.
So how would you solve such problem?
Edit:
While the described use case (one or none Allow() at first) of this class is very likely it's not definite and so I must allow user call Allow() dynamically.
Another Edit:
Solutions which use function pointer: What about the performance overhead incurred by using function pointer which is not able to make inline by the compiler? If we use std::function instead will that solve the issue?
Usually, the program will have one call of allow or none at the
beginning of the program and then afterwards many accesses. So,
performance wise, this code is bad because it every-time performs the
check if (allow) where usually it's either always true or always
false. So how would you solve such problem?
I won't, The CPU will.
the Branch Prediction will figure out that the answer is most likely to be same for some long time so it will able to optimize the branch in the hardware level very much. it will still incur some overhead, but very negligible.
If you really need to optimize your program, I think your better use std::unordered_map instead of std::map, or move to some faster map implementation, like google::dense_hash_map. the branch is insignificant compared to map-lookup.
If you want to decrease the time-cost, you have to increase the memory-cost. Accepting that, you can do this with a function pointer. Below is my answer:
class ReadWrite {
public:
void Write(size_t add, int val) { db[add] = val; }
// when allowed, make the function pointer point to read2
void Allow() { Read = &ReadWrite::read2;}
//function pointer that points to read1 by default
int (ReadWrite::*Read)(size_t) = &ReadWrite::read1;
private:
int read1(size_t add){return db.at(add);}
int read2(size_t add) {return db[add];}
std::map<size_t, int> db;
};
The function pointer can be called as the other member functions. As an example:
ReadWrite rwObject;
//some code here
//...
rwObject.Read(5); //use of function pointer
//
Note that non-static data member initialization is available with c++11, so the int (ReadWrite::*Read)(size_t) = &ReadWrite::read1; may not compile with older versions. In that case, you have to explicitly declare one constructor, where the initialization of the function pointer can be done.
You can use a pointer to function.
class ReadWrite {
public:
void Write(size_t add, int val) { db[add] = val; }
int Read(size_t add) { (this->*Rfunc)(add); }
void Allow() { Rfunc = &ReadWrite::Read2; }
private:
std::map<size_t, int> db;
int Read1(size_t add) { return db.at(add); }
int Read2(size_t add) { return db[add]; }
int (ReadWrite::*Rfunc)(size_t) = &ReadWrite::Read1;
}
If you want runtime dynamic behaviour you'll have to pay for it at runtime (at the point you want your logic to behave dynamically).
You want different behaviour at the point where you call Read depending on a runtime condition and you'll have to check that condition.
No matter whether your overhad is a function pointer call or a branch, you'll find a jump or call to different places in your program depending on allow at the point Read is called by the client code.
Note: Profile and fix real bottlenecks - not suspected ones. (You'll learn more if you profile by either having your suspicion confirmed or by finding out why your assumption about the performance was wrong.)
I was curious as to whether the following scenario is safe.
I have the following class definitions:
class ActiveStatusEffect
{
public:
StatusEffect* effect;
mutable int ReminaingTurns;
ActiveStatusEffect() : ReminaingTurns(0)
{
}
//Other unimportant stuff down here
}
I then store a group of these inside an std::set as follows:
struct ASECmp
{
bool operator ()(const StatusEffects::ActiveStatusEffect &eff1, const StatusEffects::ActiveStatusEffect &eff2)
{
return eff1.effect->GetPriority() < eff2.effect->GetPriority();
}
};
std::set<StatusEffects::ActiveStatusEffect, ASECmp> ActiveStatusEffects;
I mark RemainingTurns as mutable because I want to be able to change it without haing to constantly erase/insert into the set. I.e.
void BaseCharacter::Tick(Battles::BattleField &field, int ticks)
{
for (auto effect = ActiveStatusEffects.begin(); effect != ActiveStatusEffects.end();)// ++index)
{
auto next = effect;
++next;
if (effect->effect->HasFlag(StatusEffects::STATUS_FLAGS::TickEffect) && effect->ReminaingTurns > 0)
{
effect->effect->TickCharacter(*this, field, ticks);
--effect->ReminaingTurns;
}
if (effect->ReminaingTurns == 0)
{
ActiveStatusEffects.erase(effect);
}
effect = next;
}
}
I'm concerned because it seems possible for this to mess up the ordering within the set, meaning I can't guarantee the set will always be sorted by effect->GetPrority()
If that's true, is there a safe way (such as not have RemainingTurns form part of the key) to do this besides copying, modifying, erasing then inserting what I need to change?
EDIT:
#ildjarn - sorry, I didn't think that mattered. It just returns an int stored within StatusEffect. That int is guaranteed not to change over the runtime of the program.
int StatusEffect::GetPriority() const
{
return StatusPriority;
}
Changing data that affects the ordering of an object will indeed break the invariants of associative containers, but because ActiveStatusEffect::ReminaingTurns is not involved in the ordering of ActiveStatusEffect objects whatsoever, keeping it mutable and modifying its value is perfectly harmless.
I'm concerned because it seems possible for this to mess up the ordering within the set, meaning I can't guarantee the set will always be sorted by effect->GetPrority()
It's a std::set<StatusEffects::ActiveStatusEffect, ASECmp>; how could it sort by any criteria other than that defined by ASECmp?
If you change the key of something in a std::set you are off in Undefined Behaviour land - simple as that. Not only will it "mess up the ordering", but the set will probably stop working correctly altogether.
If the key is unrelated to the actual object, or only a part of it, then you should consider using a map rather than a set:
std::map< int, ActiveStatusEffect > m;
ActiveStatusEffect x = create();
m[ x.effect->GetPriority ] = x; // !!!
Other issues with your code is that you should use some encapsulation (user code should not get access to the internals of the class (i.e. members should not be public).