Threads and simple Dead lock cure

Threads and simple Dead lock cure - c++

When dealing with threads (specifically in C++) using mutex locks and semaphores is there a simple rule of thumb to avoid Dead Locks and have nice clean Synchronization?

A good simple rule of thumb is to always obtain your locks in a consistent predictable order from everywhere in your application. For example, if your resources have names, always lock them in alphabetical order. If they have numeric ids, always lock from lowest to highest. The exact order or criteria is arbitrary. The key is to be consistent. That way you'll never have a deadlock situation. eg.
Thread 1 locks resource A
Thread 2 locks resource B
Thread 1 waits to obtain a lock on B
Thread 2 waits to obtain a lock on A
Deadlock
The above can never happen if you follow the rule of thumb outlined above. For a more detailed discussion, see the Wikipedia entry on the Dining Philosophers problem.

If at all possible, design your code so that you never have to lock more then a single mutex/semaphore at a time.
If that's not possible, make sure to always lock multiple mutex/semaphores in the same order. So if one part of the code locks mutex A and then takes semaphore B, make sure that no other part of the code takes semaphore B and then locks mutex A.

Try to avoid acquiring one lock and trying to acquire another. This can result into circular dependency and cause for deadlock.
If it is un-avoidable then at least the order of acquire locks should be predictable.
Use RAII ( to make sure lock is release properly in case of exception as well)

There is no simple deadlock cure.
Acquire locks in agreed order: If all calls acquire A->B->C then no deadlock can occur. Deadlocks can occur only if the locking order differs between the two threads (one acquires A->B the second B->A).
In practice is hard to choose an order between arbitrary objects in memory. On a simple trivial project is possible, but on large projects with many individual contributors is very hard. A partial solution is to create hierarchies, by ranking the locks. All locks in module A have rank 1, all locks in module B have rank 2. One can acquire a lock of rank 2 when helding locks of rank 1, but not vice-versa. Of course you need a framework around the locking primitives that tracks and validates the ranking.

One way to ensure the ordering that other folks have talked about is to acquire locks in an order defined by their memory address. If at any point, you try to acquire a lock that should have been earlier in the sequence, you release all the locks and start over.
With a little work, it's possible to do this nearly automatically with some wrapper classes around the system primitives.

There's no practical cure. Specifically, there's no way to simply test code for being synchronizationally correct, or to have your programmers obey the rules of the gentleman with the green V.
There's no way to properly test the multithreaded code, because the program logic may depend on timing of locks acquisition, and therefore, be different from execution to execution, somehow invalidating the concept of QA.
I would say
prefer using threads only as a performance optimization for multi-core machines
only optimize performance when you are sure you need this performance
you may use threads to simplify program logic, but only when you are absolutely sure what you are doing. Be extra careful and all locks are confined to a very small piece of code. Do not let any newbies near such code.
never use threads in a mission-critical system, such as flying an aircraft or operating dangerous machinery
in all cases, threads are seldom cost-effective, due to higher debug and QA costs
If you determined to do threads or maintaining existing codebase:
confine all locks to small and simple pieces of code, which operate on primitives
avoid function calls or getting the program flow away to where the fact of being executed under lock is not immediately visible. This function will change by future authors, widening your lock span without your control.
get locks inside objects to reduce locking scope, wrap non-thread-safe 3rd-party objects with your own thread-safe interfaces.
never send synchronous notifications (callbacks) when executing under lock
use only RAII locks, to reduce the cognitive load when thinking "how else can we exit from here", as in exceptions, etc.
A few words on how to avoid multi-threading.
A single-threaded design usually involves some heart-beat function provided by program components, and called in a loop (called heartbeat cycle) which, when called, gives a chance to all components to do the next piece of work and to surrender control back again. What algorithmists like to think of as "loops" inside the components, will turn into state machines, to identify what is the next thing that should be done when called. State is best maintained as member data of respective objects.

There are plenty of simple "deadlock cures". But none that are easy to apply and work universally.
The simplest of all, of course, is "never have more than one thread".
Assuming you have a multithreaded application though, there are still a number of solutions:
You can try to minimize shared state and synchronization. Two threads that just run in parallel and never interact can never deadlock. Deadlocks only occur when multiple threads try to access the same resource. Why do they do that? Can that be avoided? Can the resource be restructured or divided so that for example, one thread can write to it, and other threads are asynchronously passed the data they need?
Perhaps the resource can be copied, giving each thread its own private copy to work with?
And as already mentioned by every other answer, if and when you try to acquire locks, do so in a global consistent order. To simplify this, you should try to ensure that all the locks a thread is going to need are acquired as a single operation. If a thread needs to acquire locks A, B and C, it should not make three lock() calls at different times and from different places. You'll get confused, and you won't be able to keep track of which locks are held by the thread, and which ones it has yet to acquire, and then you'll mess up the order. If you can acquire all the lock you need once, then you can factor it out into a separate function call which acquires N locks, and does so in the correct order to avoid deadlocks.
Then there are the more ambitious approaches: Techniques like CSP make threading extremely simple and easy to prove correct, even with thousands of concurrent threads. But it requires you to structure your program very differently from what you're used to.
Transactional Memory is another promising option, and one that may be easier to integrate into conventional programs. But production-quality implementations are still very rare.

Read Deadlock: the Problem and a Solution.
"The common advice for avoiding deadlock is to always lock the two mutexes in the same order: if you always lock mutex A before mutex B, then you'll never deadlock. Sometimes this is straightforward, as the mutexes are serving different purposes, but other times it is not so simple, such as when the mutexes are each protecting a separate instance of the same class".

If you want to attack the possibility of a deadlock you must attack one of the 4 crucial conditions for the existence of a deadlock.
The 4 conditions for a deadlock are:
1. Mutual Exclusion - only one thread can enter the critical section at a time.
2. Hold and Wait - a thread doesn't release the resources he acquired as long as he didn't finish his job even if other resources are un available.
3. No preemption - A thread doesn't have a priority over other threads.
4. Resource Cycle - There has to be a cycle chain of threads that waits for resources from other threads.
The easiest condition to attack is the resource cycle by making sure that no cycles are possible.

Related

Why is there no std:: equivalent to pthread_spinlock_t like there is for pthread_mutex_t & std::mutex?

I've used pthreads a fair bit for concurrent programs, mainly utilising spinlocks, mutexes, and condition variables.
I started looking into multithreading using std::thread and using std::mutex, and I noticed that there doesn't seem to be an equivalent to spinlock in pthreads.
Anyone know why this is?

there doesn't seem to be an equivalent to spinlock in pthreads.
Spinlocks are often considered a wrong tool in user-space because there is no way to disable thread preemption while the spinlock is held (unlike in kernel). So that a thread can acquire a spinlock and then get preempted, causing all other threads trying to acquire the spinlock to spin unnecessarily (and if those threads are of higher priority that may cause a deadlock (threads waiting for I/O may get a priority boost on wake up)). This reasoning also applies to all lockless data structures, unless the data structure is truly wait-free (there aren't many practically useful ones, apart from boost::spsc_queue).
In kernel, a thread that has locked a spinlock cannot be preempted or interrupted before it releases the spinlock. And that is why spinlocks are appropriate there (when RCU cannot be used).
On Linux, one can prevent preemption (not sure if completely, but there has been recent kernel changes towards such a desirable effect) by using isolated CPU cores and FIFO real-time threads pinned to those isolated cores. But that requires a deliberate kernel/machine configuration and an application designed to take advantage of that configuration. Nevertheless, people do use such a setup for business-critical applications along with lockless (but not wait-free) data structures in user-space.
On Linux, there is adaptive mutex PTHREAD_MUTEX_ADAPTIVE_NP, which spins for a limited number of iterations before blocking in the kernel (similar to InitializeCriticalSectionAndSpinCount). However, that mutex cannot be used through std::mutex interface because there is no option to customise non-portable pthread_mutexattr_t before initialising pthread_mutex_t.
One can neither enable process-sharing, robostness, error-checking or priority-inversion prevention through std::mutex interface. In practice, people write their own wrappers of pthread_mutex_t which allows to set desirable mutex attributes; along with a corresponding wrapper for condition variables. Standard locks like std::unique_lock and std::lock_guard can be reused.
IMO, there could be provisions to set desirable mutex and condition variable properties in std:: APIs, like providing a protected constructor for derived classes that would initialize that native_handle, but there aren't any. That native_handle looks like a good idea to do platform specific stuff, however, there must be a constructor for the derived class to be able to initialize it appropriately. After the mutex or condition variable is initialized that native_handle is pretty much useless. Unless the idea was only to be able to pass that native_handle to (C language) APIs that expect a pointer or reference to an initialized pthread_mutex_t.
There is another example of Boost/C++ standard not accepting semaphores on the basis that they are too much of a rope to hang oneself, and that mutex (a binary semaphore, essentially) and condition variable are more fundamental and more flexible synchronisation primitives, out of which a semaphore can be built.
From the point of view of the C++ standard those are probably right decisions because educating users to use spinlocks and semaphores correctly with all the nuances is a difficult task. Whereas advanced users can whip out a wrapper for pthread_spinlock_t with little effort.

You are right there's no spin lock implementation in the std namespace. A spin lock is a great concept but in user space is generally quite poor. OS doesn't know your process wants to spin and usually you can have worse results than using a mutex. To be noted that on several platforms there's the optimistic spinning implemented so a mutex can do a really good job. In addition adjusting the time to "pause" between each loop iteration can be not trivial and portable and a fine tuning is required. TL;DR don't use a spinlock in user space unless you are really really sure about what you are doing.
C++ Thread discussion
Article explaining how to write a spin lock with benchmark
Reply by Linus Torvalds about the above article explaining why it's a bad idea

Spin locks have two advantages:
They require much fewer storage as a std::mutex, because they do not need a queue of threads waiting for the lock. On my system, sizeof(pthread_spinlock_t) is 4, while sizeof(std::mutex) is 40.
They are much more performant than std::mutex, if the protected code region is small and the contention level is low to moderate.
On the downside, a poorly implemented spin lock can hog the CPU. For example, a tight loop with a compare-and-set assembler instructions will spam the cache system with loads and loads of unnecessary writes. But that's what we have libraries for, that they implement best practice and avoid common pitfalls. That most user implementations of spin locks are poor, is not a reason to not put spin locks into the library. Rather, it is a reason to put it there, to stop users from trying it themselves.
There is a second problem, that arises from the scheduler: If thread A acquires the lock and then gets preempted by the scheduler before it finishes executing the critical section, another thread B could spin "forever" (or at least for many milliseconds, before thread A gets scheduled again) on that lock.
Unfortunately, there is no way, how userland code can tell the kernel "please don't preempt me in this critical code section". But if we know, that under normal circumstances, the critical code section executes within 10 ns, we could at least tell thread B: "preempt yourself voluntarily, if you have been spinning for over 30 ns". This is not guaranteed to return control directly back to thread A. But it will stop the waste of CPU cycles, that otherwise would take place. And in most scenarios, where thread A and B run in the same process at the same priority, the scheduler will usually schedule thread A before thread B, if B called std::this_thread::yield().
So, I am thinking about a template spin lock class, that takes a single unsigned integer as a parameter, which is the number of memory reads in the critical section. This parameter is then used in the library to calculate the appropriate number of spins, before a yield() is performed. With a zero count, yield() would never be called.

Is there a C++ design pattern that implements a mechanism or mutex that controls the amount of time a thread can own a locked resource?

I am looking for a way to guarantee that any time a thread locks a specific resource, it is forced to release that resource after a specific period of time (if it has not already released it). Envision a connection where you need to limit the amount of time any specific thread can own that connection for.
I envision this is how it could be used:
{
std::lock_guard<std::TimeLimitedMutex> lock(this->myTimeLimitedMutex, timeout);
try {
// perform some operation with the resource that myTimeLimitedMutex guards.
}
catch (MutexTimeoutException ex) {
// perform cleanup
}
}
I see that there is a timed_mutex that lets the program timeout if a lock cannot be acquired. I need the timeout to occur after the lock is acquired.
There are already some situations where you get a resource that can be taken away unexpectedly. For instance, a tcp sockets -- once a socket connection is made, code on each side needs to handle the case where the other side drops the connection.
I am looking for a pattern that handle types of resources that normally time out on their own, but when they don't, they need to be reset. This does not have to handle every type of resource.

This can't work, and it will never work. In other words, this can never be made. It goes against all concept of ownership and atomic transactions. Because when thread acquires the lock and implements two transactions in a row, it expects them to become atomically visible to outside word. In this scenario, it would be very possible that the transaction will be torn - first part of it will be performed, but the second will be not.
What's worse is that since the lock will be forcefully removed, the part-executed transaction will become visible to outside word, before the interrupted thread has any chance to roll-back.
This idea goes contrary to all school of multi-threaded thinking.

I support SergeyAs answer. Releasing a locked mutex after a timeout is a bad idea and cannot work. Mutex stands for mutual exclusion and this is a rock-hard contract which cannot be violated.
But you can do what you want:
Problem: You want to guarantee that your threads do not hold the mutex longer than a certain time T.
Solution: Never lock the mutex for longer than time T. Instead write your code so that the mutex is locked only for the absolutely necessary operations. It is always possible to give such a time T (modulo the uncertainties and limits given my a multitasking and multiuser operating system of course).
To achieve that (examples):
Never do file I/O inside a locked section.
Never call a system call while a mutex is locked.
Avoid sorting a list while a mutex is locked (*).
Avoid doing a slow operation on each element of a list while a mutex is locked (*).
Avoid memory allocation/deallocation while a mutex is locked (*).
There are exceptions to these rules, but the general guideline is:
Make your code slightly less optimal (e.g. do some redundant copying inside the critical section) to make the critical section as short as possible. This is good multithreading programming.
(*) These are just examples for operations where it is tempting to lock the entire list, do the operations and then unlock the list. Instead it is advisable to just take a local copy of the list and clear the original list while the mutex is locked, ideally by using the swap() operation offered by most STL containers. And then do the slow operation on the local copy outside of the critical section. This is not always possible but always worth considering. Sorting has square complexity in the worst case and usually needs random access to the entire list. It is useful to sort (a copy of) the list outside of the critical section and later check whether elements need to be added or removed. Memory allocations also have quite some complexity behind them, so massive memory allocations/deallocations should be avoided.

You can't do that with only C++.
If you are using a Posix system, it can be done.
You'll have to trigger a SIGALARM signal that's only unmasked for the thread that'll timeout. In the signal handler, you'll have to set a flag and use longjmp to return to the thread code.
In the thread code, on the setjmp position, you can only be called if the signal was triggered, thus you can throw the Timeout exception.
Please see this answer for how to do that.
Also, on linux, it seems you can directly throw from the signal handler (so no longjmp/setjmp here).
BTW, if I were you, I would code the opposite. Think about it: You want to tell a thread "hey, you're taking too long, so let's throw away all the (long) work you've done so far so I can make progress".
Ideally, you should have your long thread be more cooperative, doing something like "I've done A of a ABCD task, let's release the mutex so other can progress on A. Then let's check if I can take it again to do B and so on."
You probably want to be more fine grained (have more mutex on smaller objects, but make sure you're locking in the same order) or use RW locks (so that other threads can use the objects if you're not modifying them), etc...

Such an approach cannot be enforced because the holder of the mutex needs the opportunity to clean up anything which is left in an invalid state part way through the transaction. This can take an unknown arbitrary amount of time.
The typical approach is to release the lock when doing long tasks, and re-aquire it as needed. You have to manage this yourself as everyone will have a slightly different approach.
The only situation I know of where this sort of thing is accepted practice is at the kernel level, especially with respect to microcontrollers (which either have no kernel, or are all kernel, depending on who you ask). You can set an interrupt which modifies the call stack, so that when it is triggered it unwinds the particular operations you are interested in.

"Condition" variables can have timeouts. This allows you to wait until a thread voluntarily releases a resource (with notify_one() or notify_all()), but the wait itself will timeout after a specified fixed amount of time.
Examples in the Boost documentation for "conditions" might make this more clear.
If you want to force a release, you have to write the code which will force it though. This could be dangerous. The code written in C++ can be doing some pretty close-to-the-metal stuff. The resource could be accessing real hardware and it could be waiting on it to finish something. It may not be physically possible to end whatever the program is stuck on.
However, if it is possible, then you can handle it in the thread in which the wait() times out.

Do mutexes guarantee ordering of acquisition? Unlocking thread takes it again while others are still waiting

A coworker had an issue recently that boiled down to what we believe was the following sequence of events in a C++ application with two threads:
Thread A holds a mutex.
While thread A is holding the mutex, thread B attempts to lock it. Since it is held, thread B is suspended.
Thread A finishes the work that it was holding the mutex for, thus releasing the mutex.
Very shortly thereafter, thread A needs to touch a resource that is protected by the mutex, so it locks it again.
It appears that thread A is given the mutex again; thread B is still waiting, even though it "asked" for the lock first.
Does this sequence of events fit with the semantics of, say, C++11's std::mutex and/or pthreads? I can honestly say I've never thought about this aspect of mutexes before.
Are there any fairness guarantees to prevent starvation of other threads for too long, or any way to get such guarantees?

Known problem. C++ mutexes are thin layer on top of OS-provided mutexes, and OS-provided mutexes are often not fair. They do not care for FIFO.
The other side of the same coin is that threads are usually not pre-empted until they run out of their time slice. As a result, thread A in this scenario was likely to continue to be executed, and got the mutex right away because of that.

The guarantee of a std::mutex is enable exclusive access to shared resources. Its sole purpose is to eliminate the race condition when multiple threads attempt to access shared resources.
The implementer of a mutex may choose to favor the current thread acquiring a mutex (over another thread) for performance reasons. Allowing the current thread to acquire the mutex and make forward progress without requiring a context switch is often a preferred implementation choice supported by profiling/measurements.
Alternatively, the mutex could be constructed to prefer another (blocked) thread for acquisition (perhaps chosen according FIFO). This likely requires a thread context switch (on the same or other processor core) increasing latency/overhead. NOTE: FIFO mutexes can behave in surprising ways. E.g. Thread priorities must be considered in FIFO support - so acquisition won't be strictly FIFO unless all competing threads are the same priority.
Adding a FIFO requirement to a mutex's definition constrains implementers to provide suboptimal performance in nominal workloads. (see above)
Protecting a queue of callable objects (std::function) with a mutex would enable sequenced execution. Multiple threads can acquire the mutex, enqueue a callable object, and release the mutex. The callable objects can be executed by a single thread (or a pool of threads if synchrony is not required).

•Thread A finishes the work that it was holding the mutex for, thus
releasing the mutex.
•Very shortly thereafter, thread A needs to touch a resource that is
protected by the mutex, so it locks it again
In real world, when the program is running. there is no guarantee provided by any threading library or the OS. Here "shortly thereafter" may mean a lot to the OS and the hardware. If you say, 2 minutes, then thread B would definitely get it. If you say 200 ms or low, there is no promise of A or B getting it.
Number of cores, load on different processors/cores/threading units, contention, thread switching, kernel/user switches, pre-emption, priorities, deadlock detection schemes et. al. will make a lot of difference. Just by looking at green signal from far you cannot guarantee that you will get it green.
If you want that thread B must get the resource, you may use IPC mechanism to instruct the thread B to gain the resource.

You are inadvertently suggesting that threads should synchronise access to the synchronisation primitive. Mutexes are, as the name suggests, about Mutual Exclusion. They are not designed for control flow. If you want to signal a thread to run from another thread you need to use a synchronisation primitive designed for control flow i.e. a signal.

You can use a fair mutex to solve your task, i.e. a mutex that will guarantee the FIFO order of your operations. Unfortunately, C++ standard library doesn't have a fair mutex.
Thankfully, there are open-source implementations, for example yamc (a header-only library).

The logic here is very simple - the thread is not preempted based on mutexes, because that would require a cost incurred for each mutex operation, which is definitely not what you want. The cost of grabbing a mutex is high enough without forcing the scheduler to look for other threads to run.
If you want to fix this you can always yield the current thread. You can use std::this_thread::yield() - http://en.cppreference.com/w/cpp/thread/yield - and that might offer the chance to thread B to take over the mutex. But before you do that, allow me to tell you that this is a very fragile way of doing things, and offers no guarantee. You could, alternatively, investigate the issue deeper:
Why is it a problem that the B thread is not started when A releases the resource? Your code should not depend on such logic.
Consider using alternative thread synchronization objects like barriers (boost::barrier or http://linux.die.net/man/3/pthread_barrier_wait ) instead, if you really need this sort of logic.
Investigate if you really need to release the mutex from A at that point - I find the practice of locking and releasing fast a mutex for more than one time a code smell, it usually impacts terribly the performace. See if you can group extraction of data in immutable structures which you can play around with.
Ambitious, but try to work without mutexes - use instead lock-free structures and a more functional approach, including using a lot of immutable structures. I often found quite a performance gain from updating my code to not use mutexes (and still work correctly from the mt point of view)

How do you know this:
While thread A is holding the mutex, thread B attempts to lock it.
Since it is held, thread B is suspended.
How do you know thread B is suspended. How do you know that it is not just finished the line of code before trying to grab the lock, but not yet grabbed the lock:
Thread B:
x = 17; // is the thread here?
// or here? ('between' lines of code)
mtx.lock(); // or suspended in here?
// how can you tell?
You can't tell. At least not in theory.
Thus the order of acquiring the lock is, to the abstract machine (ie the language), not definable.

Thread safe programming

I keep hearing about thread safe. What is that exactly and how and where can I learn to program thread safe code?
Also, assume I have 2 threads, one that writes to a structure and another one that reads from it. Is that dangerous in any way? Is there anything I should look for? I don't think it is a problem. Both threads will not (well can't ) be accessing the struct at the exact same time..
Also, can someone please tell me how in this example : https://stackoverflow.com/a/5125493/1248779 we are doing a better job in concurrency issues. I don't get it.

It's a very deep topic. At the heart threads are usually about making things go fast by using multiple cores at the same time; or about doing long operations in the background when you don't have a good way to interleave the operation with a 'primary' thread. The latter being very common in UI programming.
Your scenario is one of the classic trouble spots, and one of the first people run into. It's vary rare to have a struct where the members are truly independent. It's very common to want to modify multiple values in the structure to maintain consistency. Without any precautions it is very possible to modify the first value, then have the other thread read the struct and operate on it before the second value has been written.
Simple example would be a 'point' struct for 2d graphics. You'd like to move the point from [2,2] to [5,6]. If you had a different thread drawing a line to that point you could end up drawing to [5,2] very easily.
This is the tip of the iceberg really. There are lots of great books, but learning this space usually goes something like this:
Uh oh, I just read from that thing in an inconsistent state.
Uh oh, I just modified that thing from 2 threads and now it's garbage.
Yay! I learned about locks
Whoa, I have a lot of locks and everything seems to just hang sometimes when I have lots of them locking in nested code.
Hrm. I need to stop doing this locking on the fly, I seem to be missing a lot of places; so I should encapsulate them in a data structure.
That data structure thing was great, but now I seem to be locking all the time and my code is just as slow as a single thread.
condition variables are weird
It's fast because I got clever with how I lock things. Hrm. Sometimes data corrupts.
Whoa.... InterlockedWhatDidYouSay?
Hey, look no lock, I do this thing called a spin lock.
Condition variables. Hrm... I see.
You know what, how about I just start thinking about how to operate on this stuff in completely independent ways, pipelineing my operations, and having as few cross thread dependencies as possible...
Obviously it's not all about condition variables. But there are many problems that can be solved with threading, and probably almost as many ways to do it, and even more ways to do it wrong.

Thread-safety is one aspect of a larger set of issues under the general heading of "Concurrent Programming". I'd suggest reading around that subject.
Your assumption that two threads cannot access the struct at the same time is not good. First: today we have multi-core machines, so two threads can be running at exactly the same time. Second: even on a single core machine the slices of time given to any other thread are unpredicatable. You have to anticipate that ant any arbitrary time the "other" thread might be processing. See my "window of opportunity" example below.
The concept of thread-safety is exactly to answer the question "is this dangerous in any way". The key question is whether it's possible for code running in one thread to get an inconsistent view of some data, that inconsistency happening because while it was running another thread was in the middle of changing data.
In your example, one thread is reading a structure and at the same time another is writing. Suppose that there are two related fields:
{ foreground: red; background: black }
and the writer is in the process of changing those
foreground = black;
<=== window of opportunity
background = red;
If the reader reads the values at just that window of opportunity then it sees a "nonsense" combination
{ foreground: black; background: black }
This essence of this pattern is that for a brief time, while we are making a change, the system becomes inconsistent and readers should not use the values. As soon as we finish our changes it becomes safe to read again.
Hence we use the CriticalSection APIs mentioned by Stefan to prevent a thread seeing an inconsistent state.

what is that exactly?
Briefly, a program that may be executed in a concurrent context without errors related to concurrency.
If ThreadA and ThreadB read and/or write data without errors and use proper synchronization, then the program may be threadsafe. It's a design choice -- making an object threadsafe can be accomplished a number of ways, and more complex types may be threadsafe using combinations of these techniques.
and how and where can I learn to program thread safe code?
boost/libs/thread/ would likely be a good introduction. The topic is quite complex.
The C++11 standard library provides implementations for locks, atomics and threads -- any well written programs which use these would be a good read. The standard library was modeled after boost's implementation.
also, assume I have 2 threads one that writes to a structure and another one that reads from it. Is that dangerous in any way? is there anything I should look for?
Yes, it can be dangerous and/or may produce incorrect results. Just imagine that a thread may run out of its time at any point, and then another thread could then read or modify that structure -- if you have not protected it, it may be in the middle of an update. A common solution is a lock, which can be used to prevent another thread from accessing shared resources during reads/writes.

When writing multithreaded C++ programs on WIN32 platforms, you need to protect certain shared objects so that only one thread can access them at any given time from different threads. You can use 5 system functions to achieve this. They are InitializeCriticalSection, EnterCriticalSection, TryEnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection.
Also maybe this links can help:
how to make an application thread safe?
http://www.codeproject.com/Articles/1779/Making-your-C-code-thread-safe

Thread safety is a simple concept: is it "safe" to perform operation A on one thread whilst another thread is performing operation B, which may or may not be the same as operation A. This can be extended to cover many threads. In this context, "safe" means:
No undefined behaviour
All invariants of the data structures are guaranteed to be observed by the threads
The actual operations A and B are important. If two threads both read a plain int variable, then this is fine. However, if any thread may write to that variable, and there is no synchronization to ensure that the read and write cannot happen together, then you have a data race, which is undefined behaviour, and this is not thread safe.
This applies equally to the scenario you asked about: unless you have taken special precautions, then it is not safe to have one thread read from a structure at the same time as another thread writes to it. If you can guarantee that the threads cannot access the data structure at the same time, through some form of synchronization such as a mutex, critical section, semaphore or event, then there is not a problem.
You can use things like mutexes and critical sections to prevent concurrent access to some data, so that the writing thread is the only thread accessing the data when it is writing, and the reading thread is the only thread accessing the data when it is reading, thus providing the guarantee I just mentioned. This therefore avoids the undefined behaviour mentioned above.
However, you still need to ensure that your code is safe in the wider context: if you need to modify more than one variable then you need to hold the lock on the mutex across the whole operation rather than for each individual access, otherwise you may find that the invariants of your data structure may not be observed by other threads.
It is also possible that a data structure may be thread safe for some operations but not others. For example, a single-producer single-consumer queue will be OK if one thread is pushing items on the queue and another is popping items off the queue, but will break if two threads are pushing items, or two threads are popping items.
In the example you reference, the point is that global variables are implicitly shared between all threads, and therefore all accesses must be protected by some form of synchronization (such as a mutex) if any thread can modify them. On the other hand, if you have a separate copy of the data for each thread, then that thread can modify its copy without worrying about concurrent access from any other thread, and no synchronization is required. Of course, you always need synchronization if two or more threads are going to operate on the same data.
My book, C++ Concurrency in Action covers what it means for things to be thread safe, how to design thread safe data structures, and the C++ synchronization primitives used for the purpose, such as std::mutex.

Threads safe is when a certain block of code is protected from being accessed by more than one thread. Meaning that the data manipulated always stays in a consistent state.
A common example is the producer consumer problem where one thread reads from a data structure while another thread writes to the same data structure : Detailed explanation

To answer the second part of the question: Imagine two threads both accessing std::vector<int> data:
//first thread
if (data.size() > 0)
{
std::cout << data[0]; //fails if data.size() == 0
}
//second thread
if (rand() % 5 == 0)
{
data.clear();
}
else
{
data.push_back(1);
}
Run these threads in parallel and your program will crash because std::cout << data[0]; might be executed directly after data.clear();.
You need to know that at any point of your thread code, the thread might be interrupted, e.g. after checking that (data.size() > 0), and another thread could become active. Although the first thread looks correct in a single threaded app, it's not in a multi-threaded program.

Making a function thread safe or not?

Lets say I have two threads, A and B. I create Thread B from Thread A to access a function.
Thread A uses the function most of time and Thread B would rarely use it on a certain command.
If i make the function thread-safe by putting it around critical section then wouldn't it make Thread A processing slow for each time even if Thread B doesn't exist? Or Should i do the work of Thread B in Thread A itself instead of creating it?

PROFILE FIRST. Odds are good that the weight of a mutex is not going to be significant--you can lock and unlock them millions of times per second. Code correctly--using whatever locking mechanism is least prone to bugs or failure on your target system--and then, only then, if you see you need to improve performance, consider alternative strategies.
If you are expecting very low contention for the resource (i.e. the function), then a full-on mutex type may be wasteful, and depending on the particular usage pattern for the resource, a spin lock may be a viable alternative.

The actual performance implication of the critical section is something you'll have to measure for yourself. If the code has a lot of work relative to the number of times the critical section is entered, then chances are it won't hurt you too much. If you're doing almost no work in the critical section (for example, just incrementing a number), then you probably will see a performance hit. I'd suggest just trying out the multithreaded approach with the critical section and seeing what happens. If performance is good enough, then you don't need to worry. If it ends up being too slow, then you have some numbers to back up why you'd want to just have thread A do the processing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js