Is std::regex thread safe? - c++

Related to Is a static boost::wregex instance thread-safe? but for the standarized version. Can I call regex_search from several threads with the same regex object?

Claiming that std::regex is thread-safe in every respect is a pretty bold statement. The C++11 standard does not make such guarantees for the regex library.
However, looking at the prototype of std::regex_search shows that it it takes the basic_regex object as a const argument. This means that it is protected by the standard library's guarantee that the const modifier implies thread-safety of the function with respect to that argument.
In standardese, that is:
[17.6.5.9/1]
This section specifies requirements that implementations shall meet to prevent data races (1.10). Every standard library function shall meet each requirement unless otherwise specified. Implementations may prevent data races in cases other than those specified below.
[17.6.5.9/3]
A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.
So, barring a bug in the implementation of the standard library that you use, it appears that calls to std::regex_search are thread-safe with respect to the regex object that is passed in.
Other thoughts:
Just because std::regex_search is re-entrant with respect to its regex argument does not mean that you are completely out of the water. Performing an operation that modifies a regex in a non-thread-safe manner at the same time as a thread-safe call such as std::regex_search is still undefined behaviour. basic_regex's assignment operator, std::swap, and basic_regex::imbue come to mind as non-thread-safe functions with respect to the basic_regex they operate on. Knowing this, it might be better for you to make a copy of the regex object, which should come at a minimal performance cost, for each thread to use/modify at its leisure.

While Sean's answer is true of the standard, individual implementation may fall short. VC++ 2013, at least, looks like it has race conditions in its copy constructor and in a lazily evaluated variable.

Related

What does the C++ standard library guarantee to avoid data races?

Reading C++11 FAQ -- Threads there's this paragraph which I don't understand:
Consequently, C++11 provides some rules/guarantees for the programmer to avoid data races:
A C++ standard library function shall not directly or indirectly access objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's arguments, including this.
A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's nonconst arguments, including this.
C++ standard library implementations are required to avoid data races when different elements in the same sequence are modified concurrently.
I know what a data race is, and multithreading generally, but I don't understand what these sentences are saying.
Could you explain them more clearly? Perhaps with an example? What is or isn't it safe for me (i.e. an application programmer) to do in a multi-threaded context?
Had I not read these, I would have guessed that it's not safe to have multiple threads calling a non-const method of an object of any type, but I suppose this is saying something in addition to that?
OK I think I figured it out from the comments (please correct me if I'm wrong).
These first two are similar -- i.e. that a function will only read or write memory that's reachable via the function argument.
Perhaps this means, no more and no less than, that functions won't read or mutate global or static data in their implementation.
There were a few functions in the C library which broke this rule, for example ctime.
ctime returns a pointer to static data and is not thread-safe.
The third is saying that a thread can mutate an element in a container while another thread accesses a different element.
This does not imply that it's also inherently safe to mutate the container, e.g. to insert a new element.
And vector<bool> is an exception to this rule:
std::vector<bool> behaves similarly to std::vector, but in order to be space efficient, it ... does not guarantee that different elements in the same container can be modified concurrently by different threads.

Const-ness affecting concurrent algorithms?

Are there any examples of the absence or presence of const affecting the concurrency of any C++ Standard Library algorithms or containers? If there aren't, is there any reason using the const-ness in this way is not permitted?
To clarify, I'm assuming concurrent const access to objects is data-race free, as advocated in several places, including GotW 6.
By analogy, the noexcept-ness of move operations can affect the performance of std::vectors methods such as resize.
I skimmed through several of the C++17 concurrent Algorithms, hoping to find an example, but I didn't find anything. Algorithms like transform don't require the unary_op or binary_op function objects to be const. I did find for_each takes a MoveConstructible function object for the original, non execution policy version, and take a CopyConstructible function object for the C++17 execution policy version; I thought this might be an example where the programmer was manually forced to select one or the other based on if the function object could be safely copied (a const operation)... but at the moment I suspect this requirement is just there to support the return type (UnaryFunction vs. void).

Are c++ standard library containers like std::queue guaranteed to be reentrant?

I've seen people suggest that I should wrap standard containers such as std::queue and std::vector in a mutex lock or similar if i wish to use them.
I read that as needing a lock for each individual instance of a container being accessed by multiple threads, not per type or any utilization of the c++ standard library. But this assumes that the standard containers and standard library is guaranteed to be re-entrant.
Is there such a guarantee in the language?
The standard says:
Except where explicitly specified in this standard, it is implementation-defined which functions in the Standard C++ library may be recursively reentered.
Then it proceeds to specify that a function must be reentrant in, if I count them correctly, zero cases.
If one is to strictly follow the standard in this regard, the standard library suddenly becomes rather limited in its usefulness. A huge number of library functions call user-supplied functions. Writers of these functions, especially if those are themselves released as a library, in general don't know where they will be called from.
It is completely reasonable to assume that e.g. any constructor may be called from emplace_back of any standard container; if the user wishes to eliminate any uncertainty, he must refrain from any calls to emplace_back in any constructor. Any copy constructor is callable from e.g. vector::resize or sort, so one cannot manage vectors or do sorting in copy constructors. And so on, ad libitum.
This includes calling any third party component that might reasonably be using the standard library.
All these restrictions together probably mean that a large part of the standard library cannot be used in real world programs at all.
Update: this doesn't even start taking threads into consideration. With multiple threads, at least functions that deal with containers and algorithms must be reentrant. Imagine that std::vector::operator[] is not reentrant. This would mean that one cannot access two different vectors at the same time from two different threads! This is clearly not what the standard intends. I understand that this is your primary interest. To reiterate, no, I don't think there is reentrancy guarantee; and no, I don't think absence of such guarantee is reasonable in any way. --- end update.
My conclusion is that this is probably an oversight. The standard should mandate that all standard functions must be reentrant, unless otherwise specified.
I would
completely ignore the possibility of of any standard function being non-reentrant, except when it is clear that the function cannot be reasonably made reentrant.
raise an issue with the standards committee.
[Answer left for historical purposes, but see n.m.'s answer. There's no requirement on individual functions, but there is a single global non-requirement]
Yes, the standard guarantees reentrancy of member functions of standard containers.
Let me define what (non)-reentrancy means for functions. A reentrant function can be called with well-defined behavior on a thread while it is already on the call stack of that thread, i.e. executing. Obviously, this can only happen if the control flow temporarily left the reentrant function via a function call. If the behavior is not well-defined, the function is not reentrant.
(Leaf functions can't be said to be reentrant or non-reentrant, as the flow of control can only leave a leaf function by returning, but this isn't critical to the analysis).
Example:
int fac(int n) { return n==0 ? 1 : n * fac(n-1); }
The behavior of fac(3) is to return 6, even while fac(4) is running. Therefore, fac is reentrant.
The C++ Standard does define the behavior of member functions of standard containers. It also defines all restrictions under which such behavior is guaranteed. None of the member functions of standard containers have restrictions with respect to reentrancy. Therefore, any implementation which would restrict reentrancy is non-conformant.

Are C++11 objects potentially slower in multi-threaded environments because of the new const?

According to Herb Sutter (http://isocpp.org/blog/2012/12/you-dont-know-const-and-mutable-herb-sutter), in C++11 const methods must not alter the object bit-wise, or must perform internal synchronization (e.g. using a mutex) if they have mutable data members.
Suppose I have a global object that I'm accessing from multiple threads and suppose it has mutable members. For the sake of argument, let's assume that we cannot modify the source of the class (it's provided by a third-party).
In C++98 these threads would use a global mutex to synchronize access to this object. So, an access would require a single mutex lock/unlock.
However, in C++11, any const member function call on this object will invoke internal synchronization as well, so potentially, a single const function call on this object will cost 2 lock/unlock operations (or more, depending on how many functions you call from a single thread). Note that the global mutex is still needed, because const doesn't seem to do anything for writers (except possibly slowing them down as well, if one of the non-const methods calls a const method).
So, my question is: If all of our classes have to be this way in C++ (at least to be usable by STL), doesn't this lead to excessive synchronization measures?
Thanks
Edit: Some clarifications:
It seems that in C++11, you cannot use a class with the standard library unless its const member functions are internally synchronized (or do not perform any writes).
While C++11 doesn't automatically add any synchronization code itself, a standard-library-compliant class doesn't need synchronization in C++98, but needs it in C++11. So, in C++98 you can get away with not doing any internal synchronization for mutable members, but in C++11 you can't.
in C++11, any const member function call on this object will invoke internal synchronization as well
Why? That synchronisation doesn't just magically appear in the class, it's only there if someone adds it explicitly.
so potentially, a single const function call on this object will cost 2 lock/unlock operations
Only if someone has added an internal mutex to it and you also use an external one ... but why on earth would you do that?
Note that the global mutex is still needed, because const doesn't seem to do anything for writers (except possibly slowing them down as well, if one of the non-const methods calls a const method).
If the class has an internal mutex that's used to make the const members thread-safe then it could also be used for non-const members. If the class doesn't have an internal mutex, then the situation is identical to the C++98 one.
I think you're seeing a problem that doesn't exist.
Herb's "new meaning for const" is not enforced by the language or compiler, it's just design guidance, i.e. an idiom for good code. To follow that guidance you don't add mutexes to every class so const members are allowed to modify mutable members, you avoid mutable members! In the rare cases where you absolutely must have mutable members, either require users to do their own locking (and clearly document the class as requiring external synchronisation) or add internal synchronisation and pay the extra cost ... but those situations should be rare, so it's not true that "C++11 objects are slower because of the new const" because most well-designed objects don't have mutable members anyway.
Yes, you are absolutely correct. You should make your objects follow these guidelines, and therefore access to them will potentially be slower in C++11. If and only if:
The class has mutable members which const member functions modify.
The object is being accessed from multiple threads.
If you ensure that at least one of these is untrue, then nothing changes. The number of objects that are being accessed from multiple threads should always be as minimal as possible. And the number of classes that have mutable members should be minimal. So you're talking about a minimal set of a minimal set of objects.
And even then... all that is required is that data races will not be broken. Depending on what the mutable data even is, this could simply be an atomic access.
I fail to see the problem here. Few of the standard library objects will have mutable members. I defy you to find a reasonable implementation of basic_string, vector, map, etc that need mutable members.
It seems that in C++11, you cannot use a class with the standard library unless its const member functions are internally synchronized (or do not perform any writes).
This is incorrect. You most certainly can. What you cannot do is attempt to access that class across multiple threads in a way that would "perform any writes" on those mutable members. If you never access that object through that C++11 class across threads in that particular way, you're fine.
So yes, you can use them. But you only get the guarantees that your own class provides. If you use your class through a standard library class in an unreasonable way (like your const member functions not being const or properly synchronized), then that's your fault, not the library's.
So, in C++98 you can get away with not doing any internal synchronization for mutable members, but in C++11 you can't.
That's like saying you can get away with computer crime back in the Roman Empire. Of course you can. They didn't have computers back then; so they didn't know what computer crime was.
C++98/03 did not have the concept of "threading". Thus, the standard has no concept of "internal synchronization", so what you could or could not "get away with" was neither defined nor undefined. It made no more sense to ask that question of the standard than to ask what the hacking laws were during Ceaser's day.
Now that C++11 actually defines this concept and the idea of a race condition, C++11 is able to say when you can "get away with not doing any internal synchronization".
Or, to put it another way, here is how the two standards answer your question: What is the result of a potential data race on a mutable member when accessed via a member function declared const in the standard library?
C++11: There will be no data races on any internal members when accessed by a const function. All standard library implementations of such functions must be implemented in such a way that a data race cannot occur.
C++98/03: What's a data race?

standard containers as local variables in multi-threaded application

I'm aware of the fact that the containers from standard library are not thread-safe. By that I used to think that a container, say of type std::list, cannot be accessed by more than one thread concurrently (some of which may modify the container). But now it seems that there is more to it than meets the eye; something more subtle, something not so obvious, well at least to me.
For example, consider this function which accepts the first argument by value:
void log(std::string msg, severity s, /*...*/)
{
return; //no code!
}
Is this thread-safe?
At first, it seems that it is thread-safe, as the function body accesses no shared modifiable resources, hence thread-safe. On second thought, it comes to me that when invoking such a function, an object of type std::string will be created, which is the first argument, and I think that construction of this object isn't thread-safe, as it internally uses std::allocator, which I believe isn't thread-safe. Hence invoking such a function isn't thread-safe either. But if it is correct, then what about this:
void f()
{
std::string msg = "message"; //is it thread-safe? it doesn't seem so!
}
Am I going right? Can we use std::string (or any container which uses std::allocator internally) in multi-threaded program?
I'm specifically talking about containers as local variables, as opposed to shared objects.
I searched google and found many similar doubts, with no concrete answer. I face similar problem as his:
c++ allocators thread-safe?
Please consider C++03 and C++11, both.
In C++11, std::allocator is thread safe. From its definition:
20.6.9.1/6: Remark: the storage is obtained by calling ::operator new(std::size_t)
and from the definition of ::operator new:
18.6.1.4: The library versions of operator new and operator delete, user replacement versions of global operator new and operator delete, and the C standard library functions calloc, malloc, realloc, and free shall
not introduce data races (1.10) as a result of concurrent calls from different threads.
C++03 had no concept of threads, so any thread safety was implementation-specific; you'd have to refer to your implementation's documentation to see what guarantees it offered, if any. Since you're using Microsoft's implementation, this page says that it is safe to write to multiple container objects of the same class from many threads, which implies that std::allocator is thread-safe.
In C++11 this would be addressed for the default allocator in:
20.6.9.1 allocator members [allocator.members]
Except for the destructor, member functions of the default allocator
shall not introduce data races (1.10) as a result of concurrent calls
to those member functions from different threads. Calls to these
functions that allocate or deallocate a particular unit of storage
shall occur in a single total order, and each such deallocation call
shall happen before the next allocation (if any) in this order.
Any user-provided allocator would have to hold to the same constraints if it were going to be used across different threads.
Of course, for earlier versions of the standard, nothing is said about this since they didn't talk about multithreading. If an implementation were to support multithreading (as many or most do), it would be responsible for taking care of those issues. Similar to the way implementations provide a thread-safe malloc() (and other library functions) for C and C++ even though the standards prior to very recently said nothing about that.
As you may have already figured, there is not going to be an easy yes or no answer. However, I think this may help:
http://www.cs.huji.ac.il/~etsman/Docs/gcc-3.4-base/libstdc++/html/faq/index.html#5_6
I quote verbatim:
5.6 Is libstdc++-v3 thread-safe?
libstdc++-v3 strives to be thread-safe when all of the following
conditions are met:
The system's libc is itself thread-safe,
gcc -v reports a thread model other than 'single',
[pre-3.3 only] a non-generic implementation of atomicity.h exists for the architecture in question.
When an std::string is copied during call to log, the allocator may be thread-safe (mandatory in C++11), but the copy itself isn't. So if there is another thread mutating the source string while copy is taking place, this is not thread safe.
You may end-up with half the string as it was before mutation and another half after, or may even end-up accessing deallocated memory if the mutating thread reallocated (e.g. by appending new characters) or deleted the string, while the copy was still taking place.
OTOH, the...
std::string msg = "message";
...is thread safe provided your allocator is thread safe.