What does C++11 consider to be a "thread"?

What does C++11 consider to be a "thread"? - c++

C++11 has some notion of threads. For example, it defines a new storage specifier thread_local, and specifies that for variables with this storage specifier, "there is a distinct object or reference per thread" [basic.stc.thread].
What is considered to be a "thread" for this purpose? Is it only threads created using the standard thread library (i.e. those represented by std::thread objects)? What about threads created by other means (for example, by using pthreads directly on Linux)? What if I use a library that provides user-space threads - does each of those get its own copies of thread_local objects (I don't really see how that could be implemented)?
If the answer is "it's implementation-defined what is considered to be a thread for purposes such as thread_local", could someone give an example of how one well-known implementation defines this?

Only components from the thread support library count because of these quotes, or main which the standard states runs in its own thread of execution.
1 The following subclauses describe components to create and manage threads (1.10), perform mutual exclusion, and communicate conditions and values between threads, as summarized in Table 148.
The link to 1.10 implies that the threads being spoken about are these.
1 A thread of execution (also known as a thread) is a single flow of control within a program, including the initial ...
Therefore it seems to me threads only refer to the stdlib threads (meaning std::thread and anything the thread support library does internally). Of course thread_local in many cases could end up working with the native threads (especially when you consider on a specific system you don't usually have more than one choice for implementing threads) but as far as I can tell the standard makes no guarantee.

C++11 §1.10/1 defines the terms:
A thread of execution (also known as a thread) is a single flow of control within a program, including the initial invocation of a specific top-level function, and recursively including every function invocation subsequently executed by the thread. [ Note: When one thread creates another, the initial call to the top-level function of the new thread is executed by the new thread, not by the creating thread. — end note ]
The italicized terms indicate that this is definitive. You could argue that this definition is mathematically deficient, because each function invocation defines a new thread, but that's just obviously wrong. They mean maximal single flow of control, otherwise the non-normative note would cancel the effect of the normative "recursively including" text.
From the standpoint of the core language, it is merely incidental that std::thread causes such a thing to exist.
What if I use a library that provides user-space threads - does each of those get its own copies of thread_local objects (I don't really see how that could be implemented)?
There is no way to write such a library without kernel calls. In all likelihood all threads in your process are already represented a high-level abstraction such as pthreads, just to satisfy the kernel. The C++ standard library is likely written against the native threading library to "just work" without additional glue.
For example, thread_local objects are initialized at first access rather than when each new thread starts, so the compiler just has to insert a query based on pthread_self to access and perhaps initialize. Initialization would register a destructor with the pthread_cleanup facility.
What is implementation-defined here is whether the pre-existing native library is compatible with C++. Supposing they provide that, and it's something customers would tend to want, all other threading libraries built atop it will be automatically compatible barring some other conflict.

The standard does not describe how threads produced by other libraries and system calls behave. They are, as far as the standard is concerned, undefined in their behavior. There is no other way, within C++ proper, to create multiple threads: such libraries or system calls do things that are not standardized by the C++ standard.
Now, each such library and system call will behave in ways defined by its own specs. Quite often, the C++ std::thread will even be built on top of such libraries or system calls. How exactly the interaction works is not specified.

Related

Deduce if a program is going to use threads

Thread-safe or thread-compatible code is good.
However there are cases in which one could implement things differently (more simply or more efficiently) if one knows that the program will not be using threads.
For example, I once heard that things like std::shared_ptr could use different implementations to optimize the non-threaded case (but I can't find a reference).
I think historically std::string in some implementation could use Copy-on-write in non-threaded code.
I am not in favor or against these techniques but I would like to know if that there is a way, (at least a nominal way) to determine at compile time if the code is being compiled with the intention of using threads.
The closest I could get is to realize that threaded code is usually (?) compiled with the -pthreads (not -lpthreads) compiler option.
(Not sure if it is a hard requirement or just recommended.)
In turn -pthreads defines some macros, like _REENTRANT or _THREAD_SAFE, at least in gcc and clang.
In some some answers in SO, I also read that they are obsolete.
Are these macros the right way to determine if the program is intended to be used with threads? (e.g. threads launched from that same program). Are there other mechanism to detect this at compile time? How confident would the detection method be?
EDIT: since the question can be applied to many contexts apparently, let me give a concrete case:
I am writing a header only library that uses another 3rd party library inside. I would like to know if I should initialize that library to be thread-safe (or at least give a certain level of thread support). If I assume the maximum level of thread support but the user of the library will not be using threads then there will be cost paid for nothing. Since the 3rd library is an implementation detail I though I could make a decision about the level of thread safety requested based on a guess.
EDIT2 (2021): By chance I found this historical (but influential) library Blitz++ which in the documentation says (emphasis mine)
8.1 Blitz++ and thread safety
To enable thread-safety in Blitz++, you need to do one of these
things:
Compile with gcc -pthread, or CC -mt under Solaris. (These options define_REENTRANT,which tells Blitz++ to generate thread-safe code).
Compile with -DBZ_THREADSAFE, or #define BZ_THREADSAFE before including any Blitz++ headers.
In threadsafe mode, Blitz++ array reference counts are safeguarded by
a mutex. By default, pthread mutexes are used. If you would prefer a
different mutex implementation, add the appropriate BZ_MUTEX macros to
<blitz/blitz.h> and send them toblitz-dev#oonumerics.org for
incorporation. Blitz++ does not do locking for every array element
access; this would result in terrible performance. It is the job of
the library user to ensure that appropriate synchronization is used.
So it seems that at some point _REENTRANT was used as a clue for the need of multi-threading code.
Maybe it is a very old reference to take seriously.

I support the other answer in that thread-safety decision ideally should not be done on whole program basis, rather they should be for specific areas.
Note that boost::shared_ptr has thread-unsafe version called boost::local_shared_ptr. boost::intrusive_ptr has safe and unsafe counter implementation.
Some libraries use "null mutex" pattern, that is a mutex, which does nothing on lock / unlock. See boost or Intel TBB null_mutex, or ATL CComFakeCriticalSection. This is specifically to substitute real mutex for threqad-safe code, and a fake one for thread-unsafe.
Even more, sometimes it may make sense to use the same objects in thread-safe and thread-unsafe way, depending on current phase of execution. There's also atomic_ref which serves the purpose of providing thread-safe access to underlying type, but still letting work with it in thread unsafe.
I know a good example of runtime switches between thread-safe and thread-unsafe. See HeapCreate with HEAP_NO_SERIALIZE, and HeapAlloc with HEAP_NO_SERIALIZE.
I know also a questionable example of the same. Delphi recommends calling its BeginThread wrapper instead of CreateThread API function. The wrapper sets a global variable telling that from now on Delphi Memory Manager should be thread-safe. Not sure if this behavior is still in place, but it was there for Delphi 7.
Fun fact: in Windows 10, there are virtually no single-threaded programs. Before the first statement in main is executed, static DLL dependencies are loaded. Current Windows version makes this DLL loading paralleled where possible by using thread pool. Once program is loaded, thread pool threads are waiting for other tasks that could be issued by using of Windows API calls or std::async. Sure if program by itself will not use threads and TLS, it will not notice, but technically it is multi-threaded from the OS perspective.

How confident would the detection method be?
Not really. Even if you can unambiguously detect if code is compiled to be used with multiple threads, not everything must be thread safe.
Making everything thread-safe by default, even though it is only ever used only by a single thread would defeat the purpose of your approach. You need more fine grainded control to turn on/off thread safety if you do not want to pay for what you do not use.
If you have class that has a thread-safe and a non-thread-safe version then you could use a template parameter
class <bool isThreadSafe> Foo;
and let the user decide on a case for case basis.

How are MT programs proven correct with "non sequential" semantics?

This could be a language neutral question, but in practice I'm interested with the C++ case: how are multithread programs written in C++ versions that support MT programming, that is modern C++ with a memory model, ever proven correct?
In old C++, MT programs were just written in term of pthread semantics and validated in term of the pthread rules which was conceptually easy: use primitives correctly and avoid data races.
Now, the C++ language semantic is defined in term of a memory model and not in term of sequential execution of primitive steps. (Also the standard mentions an "abstract machine" but I don't understand any more what it means.)
How are C++ programs proven correct with that non sequential semantic? How can anyone reason about a program that is not doing primitive steps one after the other?

It is "conceptually easier" with the C++ memory model than it was with pthreads prior to the C++ memory model. C++ prior to the memory model interacting with pthreads was loosely specified, and reasonable interpretations of the specification permitted the compiler to "introduce" data races, so it is extremely difficult (if possible at all) to reason about or prove correctness for MT algorithms in the context of older C++ with pthreads.
There seems to be a fundamental misunderstanding in the question in that C++ was never defined as a sequential execution of primitive steps. It has always been the case that there is a partial ordering between expression evaluations. And the compiler is allowed to move such expressions around unless constrained from doing so. This was unchanged by the introduction of the memory model. The memory model introduced a partial order for evaluations between separate threads of execution.
The advice "use the primitives correctly and avoid data races" still applies, but the C++ memory model more strictly and precisely constrains the interaction between the primitives and the rest of the language, allowing more precise reasoning.
In practice, it is not easy to prove correctness in either context. Most programs are not proven to be data race free. One tries to encapsulate as much as possible any synchronization so as to allow reasoning about smaller components, some of which can be proven correct. And one uses tools such as address sanitizer and thread sanitizer to catch data races.
On data races, POSIX says:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.... Applications may allow more than one thread of control to read a memory location simultaneously.
On data races, C++ says:
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
C++ defines more terms and tries to be more precise. The gist of this is that both forbid data races, which in both, are defined as conflicting accesses, without the use of the synchronization primitives.
POSIX says that the pthread functions synchronize memory with respect to other threads. That is underspecified. One could reasonably interpret that as (1) the compiler cannot move memory accesses across such a function call, and (2) after calling such a function in one thread, the prior actions to memory from that thread will be visible to another thread after it calls such a function. This was a common interpretation, and this is easily accomplished by treating the functions as opaque and potentially clobbering all of memory.
As an example of problems with this loose specification, the compiler is still allowed to introduce or remove memory accesses (e.g. through register promotion and spillage) and to make larger accesses than necessary (e.g. touching adjacent fields in a struct). Therefore, the compiler completely correctly could "introduce" data races that weren't written in the source code directly. The C++11 memory model stops it from doing so.
C++ says, with regard to mutex lock:
Synchronization: Prior unlock() operations on the same object shall synchronize with this operation.
So C++ is a little more specific. You have to lock and unlock the same mutex to have synchronization. But given this, C++ says that operations before the unlock are visible to the new locker:
An evaluation A strongly happens before an evaluation D if ... there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D. [ Note: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. Strongly happens before excludes consume operations. — end note ]
(With B = unlock, C = lock, B simply happens before C because B synchronizes with C. Sequenced before is a concept in a single thread of execution, so for example, one full expression is sequenced before the next.)
So, if you restrict yourself to the sorts of primitives (locks, condition variables, ...) that exist in pthread, and to the type of guarantees provided by pthread (sequential consistency), C++ should add no surprises. In fact, it removes some surprises, adds precision, and is more amenable to correctness proofs.
The article Foundations of the C++ Concurrency Memory Model is a great, expository read for anyone interested in this topic about the problems with the status quo at the time and the choices made to fix them in the C++11 memory model.
Edited to more clearly state that the premise of the question is flawed, that reasoning is easier with the memory model, and add a reference to the Boehm paper, which also shaped some of the exposition.

How compiler like GCC implement acquire/release semantics for std::mutex

My understanding is that std::mutex lock and unlock have a acquire/release semantics which will prevent instructions between them from being moved outside.
So acquire/release should disable both compiler and CPU reorder instructions.
My question is that I take a look at GCC5.1 code base and don't see anything special in std::mutex::lock/unlock to prevent compiler reordering codes.
I find a potential answer in does-pthread-mutex-lock-have-happens-before-semantics which indicates a mail that says a external function call act as compiler memory fences.
Is it always true? And where is the standard?

Threads are a fairly complicated, low-level feature. Historically, there was no standard C thread functionality, and instead it was done differently on different OS's. Today there is mainly the POSIX threads standard, which has been implemented in Linux and BSD, and now by extension OS X, and there are Windows threads, starting with Win32 and on. Potentially, there could be other systems besides these.
GCC doesn't directly contain a POSIX threads implementation, instead it may be a client of libpthread on a linux system. When you build GCC from source, you have to configure and build separately a number of ancillary libraries, supporting things like big numbers and threads. That is the point at which you select how threading will be done. If you do it the standard way on linux, you will have an implementation of std::thread in terms of pthreads.
On windows, starting with MSVC C++11 compliance, the MSVC devs implemented std::thread in terms of the native windows threads interface.
It's the OS's job to ensure that the concurrency locks provided by their API actually works -- std::thread is meant to be a cross-platform interface to such a primitive.
The situation may be more complicated for more exotic platforms / cross-compiling etc. For instance, in MinGW project (gcc for windows) -- historically, you have the option to build MinGW gcc using either a port of pthreads to windows, or using a native win32 based threading model. If you don't configure this when you build, you may end up with a C++11 compiler which doesn't support std::thread or std::mutex. See this question for more details. MinGW error: ‘thread’ is not a member of ‘std’
Now, to answer your question more directly. When a mutex is engaged, at the lowest level, this involves some call into libpthreads or some win32 API.
pthread_lock_mutex();
do_some_stuff();
pthread_unlock_mutex();
(The pthread_lock_mutex and pthread_unlock_mutex correspond to the implementations of lock and unlock of std::mutex on your platform, and in idiomatic C++11 code, these are in turn called in the ctor and dtor of std::unique_lock for instance if you are using that.)
Generally, the optimizer cannot reorder these unless it is sure that pthread_lock_mutex() has no side-effects that can change the observable behavior of do_some_stuff().
To my knowledge, the mechanism the compiler has for doing this is ultimately the same as what it uses for estimating the potential side-effects of calls to any other external library.
If there is some resource
int resource;
which is in contention among various threads, it means that there is some function body
void compete_for_resource();
and a function pointer to this is at some earlier point passed to pthread_create... in your program in order to initiate another thread. (This would presumably be in the implementation of the ctor of std::thread.) At this point, the compiler can see that any call into libpthread can potentially call compete_for_resource and touch any memory that that function touches. (From the compiler's point of view libpthread is a black box -- it is some .dll / .so and it can't make assumptions about what exactly it does.)
In particular, the call pthread_lock_mutex(); potentially has side-effects for resource, so it cannot be re-ordered against do_some_stuff().
If you never actually spawn any other threads, then to my knowledge, do_some_stuff(); could be reordered outside of the mutex lock. Since, then libpthread doesn't have any access to resource, it's just a private variable in your source and isn't shared with the external library even indirectly, and the compiler can see that.

All of these questions stem from the rules for compiler reordering. One of the fundamental rules for reordering is that the compiler must prove that the reorder does not change the result of the program. In the case of std::mutex, the exact meaning of that phrase is specified in a block of about 10 pages of legaleese, but the general intuitive sense of "doesn't change the result of the program" holds. If you had a guarantee about which operation came first, according to the specification, no compiler is allowed to reorder in a way which violates that guarantee.
This is why people often claim that a "function call acts as a memory barrier." If the compiler cannot deep-inspect the function, it cannot prove that the function didn't have a hidden barrier or atomic operation inside of it, thus it must treat that function as though it was a barrier.
There is, of course, the case where the compiler can inspect the function, such as the case of inline functions or link time optimizations. In these cases, one cannot rely on a function call to act as a barrier, because the compiler may indeed have enough information to prove the rewrite behaves the same as the original.
In the case of mutexes, even such advanced optimization cannot take place. The only way to reorder around the mutex lock/unlock function calls is to have deep-inspected the functions and proven there are no barriers or atomic operations to deal with. If it can't inspect every sub-call and sub-sub-call of that lock/unlock function, it can't prove it is safe to reorder. If it indeed can do this inspection, it would see that every mutex implementation contains something which cannot be reordered around (indeed, this is part of the definition of a valid mutex implementation). Thus, even in that extreme case, the compiler is still forbidden from optimizing.
EDIT: For completeness, I would like to point out that these rules were introduced in C++11. C++98 and C++03 reordering rules only prohibited changes that affected the result of the current thread. Such a guarantee is not strong enough to develop multithreading primitives like mutexes.
To deal with this, multithreading APIs like pthreads developed their own rules. from the Pthreads specification section 4.11:
Applications shall ensure that access to any memory location by more
than one thread of control (threads or processes) is restricted such
that no thread of control can read or modify a memory location while
another thread of control may be modifying it. Such access is
restricted using functions that synchronize thread execution and also
synchronize memory with respect to other threads. The following
functions synchronize memory with respect to other threads
It then lists a few dozen functions which synchronize memory, including pthread_mutex_lock and pthread_mutex_unlock.
A compiler which wishes to support the pthreads library must implement something to support this cross-thread memory synchronization, even though the C++ specification didn't say anything about it. Fortunately, any compiler where you want to do multithreading was developed with the recognition that such guarantees are fundamental to all multithreading, so every compiler that supports multithreading has it!
In the case of gcc, it did so without any special notes on the pthreads function calls because gcc would effectively create a barrier around every external function call (because it couldn't prove that no synchronization existed inside that function call). If gcc were to ever change that, they would also have to change their pthreads headers to include any extra verbage needed to mark the pthreads functions as synchronizing memory.
All of that, of course, is compiler specific. There were no standards answers to this question until C++11 came along with its new memory model.

NOTE: I am no expert in this area and my knowledge about it is in a spaghetti like condition. So take the answer with a grain of salt.
NOTE-2: This might not be the answer that OP is expecting. But here are my 2 cents anyways if it helps:
My question is that I take a look at GCC5.1 code base and don't see
anything special in std::mutex::lock/unlock to prevent compiler
reordering codes.
g++ using pthread library. std::mutex is just a thin wrapper around pthread_mutex. So, you will have to actually go and have a look at pthread's mutex implementation.
If you go bit deeper into the pthread implementation (which you can find here), you will see that it uses atomic instructions along with futex calls.
Two minor things to remember here:
1. The atomic instructions do use barriers.
2. Any function call is equivalent to full barrier. Do not remember from where I read it.
3. mutex calls may put the thread to sleep and cause context switch.
Now, as far as reordering goes, one of the things that needs to be guaranteed is that, no instruction after lock and before unlock should be reordered to before lock or after unlock. This I believe is not a full-barrier, but rather just acquire and release barrier respectively. But, this is again platform dependent, x86 provides sequential consistency by default whereas ARM provides a weaker ordering guarantee.
I strongly recommend this blog series:
http://preshing.com/archives/
It explains lots of lower level stuff in easy to understand language. Guess, I have to read it once again :)
UPDATE:: Unable to comment on #Cort Ammons answer due to length
#Kane I am not sure about this, but people in general write barriers for processor level which takes care of compiler level barriers as well. The same is not true for compiler builtin barriers.
Now, since the pthread_*lock* functions definitions are not present in the translation unit where you are making use of it (this is doubtful), calling lock - unlock should provide you with full memory barrier. The pthread implementation for the platform makes use of atomic instructions to block any other thread from accessing the memory locations after the lock or before unlock. Now since only one thread is executing the critical portion of the code it is ensured that any reordering within that will not change the expected behaviour as mentioned in above comment.
Atomics is pretty tough to understand and to get right, so, what I have written above is from my understanding. Would be very glad to know if my understanding is wrong here.

So acquire/release should disable both compiler and CPU reorder instructions.
By definition anything that prevents CPU reordering by speculative execution prevents compiler reordering. That's the definition of language semantics, even without MT (multi-threading) in the language, so you will be safe from reordering on old compilers that don't support MT.
But these compilers aren't safe for MT for a bunch of reasons, from the lack of thread protection around runtime initialization of static variables to the implicitly modified global variables like errno, etc.
Also, in C/C++, any call to a function that is purely external (that is: not inline, available for inlining at any point), without annotation explaining what it does (like the "pure function" attribute of some popular compiler), must be assumed to do anything that legal C/C++ code can do. No non trivial reordering would be possible (any reordering that is visible is non trivial).
Any correct implementation of locks on systems with multiple units of execution that don't simulate a global order on assembly instructions will require memory barriers and will prevent reordering.
An implementation of locks on a linearly executing CPU, with only one unit of execution (or where all threads are bound on the same unit of execution), might use only volatile variables for synchronisation and that is unsafe as volatile reads resp. writes do not provide any guarantee of acquire resp. release of any other data (contrast Java). Some kind of compiler barrier would be needed, like a strongly external function call, or some asm (""/*nothing*/) (which is compiler specific and even compiler version specific).

Is a C++11 mutex compatible with threads NOT created with C++11?

I'm learning C++11 and have run into a threading issue. My general question: are C++11 mutexes compatible with threads not created with C++11's standard libraries?
I would like to safely share information between a thread created with C++11 and another thread created by a third-party library that I have no control over.
For example, my application uses PortAudio, which creates its own thread for audio output. I'm not sure if it's using pthreads, or OS-specific threading libraries, but I do know that PortAudio is NOT written in C++11. I want to safely share data between a GUI thread (using a C++11 thread) and the PortAudio thread using a mutex.
Similarly, can I use a C++11 mutex to synchronize QT QThreads and C++11 threads?

Are C++11 mutexes compatible with threads not created with C++11's standard libraries?
The C++ standard does not define a "thread" as something exclusively created by the C++ standard library.
1.10 Multi-threaded executions and data races [intro.multithread]
1 A thread of execution (also known as a thread) is a single flow of
control within a program, including the initial invocation of a
specific top-level function, and recursively including every function
invocation subsequently executed by the thread.
So, I would conclude the answer to your question is "yes".

Obviously, the C++ standard doesn't make any guarantees about compatebility with other systems. Part of the reason the C and C++ standards added threading facilities was to standardize on one threading system.
In practice it is expected that the C and C++ threads library is built to integrate with a platform threading system if there is one. For example, on platforms using pthreads the expectation is that pthreads are used where appropriate to buildtge standard library threading facilities (as far as I know there is no pthreads interface for the various atomic operations, i.e., the standard library may need to provide its own synchronization primitives).
The standard library classes provide access to the underlying representation through the native_handle() methods. A standard library should implement what is returned from these and, e.g., if pthreads types are provided it seems safe to assume that this particular standard library will play nice with pthreads.

The C++11 standard specifies that mutexes should work with any kind of 'execution agent', including different thread libraries. Here are some relevant quotes from the standard which I think answer the question conclusively:
Mutex requirements
A mutex object facilitates protection against data races and allows
safe synchronization of data between execution agents (30.2.5). An
execution agent owns a mutex from the time it successfully calls one
of the lock functions until it calls unlock.
Requirements for Lockable types
An execution agent is an entity such as a thread that may perform work
in parallel with other execution agents. [Note: Implementations or
users may introduce other kinds of agents such as processes or
thread-pool tasks. —end note ] The calling agent is determined by
context, e.g. the calling thread that contains the call, and so on.

It is inconceivable that C++11's threading implementation will be incompatible with the platform's native threading implementation because any practical program using C++11 threads is going to call into platform libraries, and those libraries may themselves be threaded or make thread related calls (to mutexes for example).
The C++11 library implementation for threads is not of course obliged to use the high level native threading library (say, pthreads or windows threads) but it probably will, for which purpose as has been mentioned there is a std::thread::native_handle() method to get the native handle. However, even where it does not use the high level native implementation, it will have to use the same low level kernel primitives underneath.
In all conceivable circumstances it should therefore be perfectly safe to use C++11 mutexes with thread instances created by native library calls, and vice versa, and mix any other native or C++ library synchronization calls. There may indeed be cases where it is necessary to do so. For example, the C++11 library does not at present provide thread pools or read-write locks (shared mutexes). You might want to use native read-write locks with threads started using std::thread, or use one of the many thread pool implementations provided by third party libraries in your C++ program.
The only caveat to observe is that trying to mix C++11 threads (which will in practice be obliged to use kernel threads in one way or another for the reasons mentioned above) with thread libraries which do not use kernel threads at all (for example, libraries based on green threads or "user" threads), is likely to lead you into trouble.
Edit: In support of this I notice that §30.3 of C++11 states, albeit non-normatively, that "These threads [std::thread threads] are intended to map one-to-one with operating system threads".

is GCC STL thread-safe?

I found contradictory information on the web:
http://www.sgi.com/tech/stl/thread_safety.html
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct containers are safe, and
simultaneous read accesses to to shared containers are safe. If
multiple threads access a single container, and at least one thread
may potentially write, then the user is responsible for ensuring
mutual exclusion between the threads during the container accesses.
http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
The user-code must guard against concurrent method calls which may
access any particular library object's state. Typically, the
application programmer may infer what object locks must be held based
on the objects referenced in a method call. Without getting into great
detail, here is an example which requires user-level locks:
All library objects are safe to use in a multithreaded program as long
as each thread carefully locks out access by any other thread while it
uses any object visible to another thread, i.e., treat library objects
like any other shared resource. In general, this requirement includes
both read and write access to objects; unless otherwise documented as
safe, do not assume that two threads may access a shared standard
library object at the same time.
I bolded the imporant part - maybe I dont understand what they mean by that,when I read object state I think of STL containers

How I understand this:
both documents say the same in different manner. MS STL implementation (actually Dinkumware one) says almost the same as your quoted SGI doc. They mean that they did nothing to make STL objects (e.g. containers) thread-safe, most probably because this would add an overhead unnecessary in many single-threaded applications. Any object is thread-safe in their terms, you can read it from multiple threads.
Also docs guarantee that STL objects are not modified under the hood in some background threads.

FWIW I updated the libstdc++ docs a while ago, it now says (emphasis mine):
The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state.

The information you cite is not contradictory. STL libraries should be safe to be used in a multi-threaded environment (actually, I've worked with one implementation where it was not the case) but it is users' burden to synchronize access to library objects. For instance, if you create a set of ints in one thread and another set of ints in another thread and you don't share either of them among threads, you should be able to use them; if you share an instance of a set among threads, it's up to you to synch the access to the set.

STL is no more. It is superseded by the C++ Standard Library. If you use the ISO C++ and the Standard Library, you should read (a) the Standard and (b) documentation that comes with your implementation of C++.
SGI STL documentation is mostly of historical interest, unless you for some reason actually use SGI STL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js