does boost::thread::timed_join(0) acquire a lock?

does boost::thread::timed_join(0) acquire a lock? - c++

I need to check if my boost::thread I've created is running from another thread. This SO post explains you can do this by calling:
boost::posix_time::seconds waitTime(0);
myBoostThread.timed_join(waitTime);
I can't have any critical sections in my client thread. Can I guarantee that timed_join() with 0 time argument be lock free?

Boost.Thread provides no guarantees about a lock-free timed_join(). However, the implementation, which is always subject to change:
Boost.Thread acquires a mutex for pthreads, then performs a timed wait on a condition variable.
Boost.Thread calls WaitForMultipleObjects for windows. Its documentation indicates that it will always return immediately. However, I do not know if the underlying OS implementation is lock-free.
For an alternative, consider using atomic operations. While Boost 1.52 does not currently provide a public atomic library, both Boost.Smart_Ptr and Boost.Interprocess have atomic integers within their detail namespace. However, neither of these guarantee lock-free implementations, and one of the configurations for Boost.Smart_Ptr will lock with pthread mutex. Thus, you may need to consult your compiler and system's documentation to identify a lock-free implementation.
Nevertheless, here is a small example using boost::detail::atomic_count:
#include <boost/chrono.pp>
#include <boost/detail/atomic_count.hpp>
#include <boost/thread.hpp>
// Use RAII to perform cleanup.
struct count_guard
{
count_guard(boost::detail::atomic_count& count) : count_(count) {}
~count_guard() { --count_; }
boost::detail::atomic_count& count_;
};
void thread_main(boost::detail::atomic_count& count)
{
// Place the guard on the stack. When the thread exits through either normal
// means or the stack unwinding from an exception, the atomic count will be
// decremented.
count_guard decrement_on_exit(count);
boost::this_thread::sleep_for(boost::chrono::seconds(5));
}
int main()
{
boost::detail::atomic_count count(1);
boost::thread t(thread_main, boost::ref(count));
// Check the count to determine if the thread has exited.
while (0 != count)
{
std::cout << "Sleeping for 2 seconds." << std::endl;
boost::this_thread::sleep_for(boost::chrono::seconds(2));
}
}
In this case, the at_thread_exit() extension could be used as an alternative to using RAII.

No, there is no such guarantee.
Even if the boost implementation is completely lock free (I haven't checked), there is no guarantee that the underlying OS implementation is completely lock free.
That said, if locks were used here, I would find it unlikely that they will cause any significant delay in the application, so I would not hesitate using timed_join unless there is a hard real-time deadline to meet (which does not equate to UI responsiveness).

Related

C++ atomics: how to allow only a single thread to access a function?

I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:
std::atomic<bool> busy (false);
bool func()
{
if (m_busy.exchange(true) == true)
return false;
// ... do stuff ...
m_busy.exchange(false);
return true;
}
Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.

Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.
Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:
std::mutex mtx;
bool func()
{
// making use of std::unique_lock so if the code throws an
// exception, the std::mutex will still get unlocked correctly...
std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
bool gotLock = lck.owns_lock();
if (gotLock)
{
// do stuff
}
return gotLock;
}

Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.
You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)
Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.
Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in #selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.
Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.
The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.
There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.

C/C++ Should I still use synchronization if I am sure that only one thread is handling the pointer/object at a time?

Suppose I have a queue of pointers to std::string and producer and consumer threads working on the queue. Let's say a producer appends to a string and puts the pointer in the queue. A consumer thread gets the pointer, appends another data to the string and puts the pointer to another queue.
Will the consumer thread read an updated data from the producer thread?
After the consumer updates the data and puts it in another queue, will consumers to that queue see the updates of the producer and the consumer thread from (1)?
EDIT: sample code
EDIT: Added complete example
#include <deque>
#include <thread>
#include <string>
#include <iostream>
#include <mutex>
#include <atomic>
class StringQueue {
public:
std::string* pop() {
std::unique_lock<std::mutex> lock(_mutex);
if (_queue.empty())
return NULL;
std::string* s = _queue.front();
_queue.pop_front();
return s;
}
void push(std::string* s) {
std::unique_lock<std::mutex> lock(_mutex);
_queue.push_back(s);
}
private:
std::deque<std::string*> _queue;
std::mutex _mutex;
};
int main(int argc, char** argv)
{
StringQueue job_queue;
StringQueue result_queue;
std::atomic<bool> run(true);
std::thread consumer([&job_queue, &result_queue, &run]{
while (run.load()) {
std::string* s = job_queue.pop();
if (s != nullptr)
s->append("BAR");
result_queue.push(s);
}
});
std::thread result_thread([&result_queue, &run]{
while (run.load()) {
std::string* s = result_queue.pop();
if (s != nullptr) {
std::cout << "Result: " << *s << std::endl;
delete s;
}
}
});
std::string input;
while (true)
{
std::cin >> input;
if (input == "STOP")
break;
std::string* s = new std::string(input);
job_queue.push(s);
}
run.store(false);
result_thread.join();
consumer.join();
}

There are many answered questions on Stack Overflow about the memory ordering model of C++ std::mutex. E.g:
Does std::mutex create a fence?
and:
Does `std::mutex` and `std::lock` guarantee memory synchronisation in inter-processor code?
When one unlocks a mutex, all memory writes done previously by the unlocking thread are guaranteed to be visible to any thread that locks the same mutex after it is locked. (In practice, locking and unlocking a std::mutex may result in a stronger barrier, e.g. not requiring synchronization on the same mutex to provide visibility, but it is not guaranteed and providing more is not desirable for performance reasons.)
In the above code, there are three threads and two mutexes. Call the threads "main", "consumer" and "result_thread". Call the two mutexes "job_queue_mutex" and "result_queue_mutex". We have two synchronization patterns:
main and consumer synchronize using job_queue_mutex
consumer and result_queue_mutex synchronize using result_queue_mutex
In both cases, all stores to memory by one thread and reads from that memory by another thread are separated by an unlock on a mutex by the thread doing the stores and a lock on the same mutex by the thread doing the reads. (One can prove this by listing all the stores and reads and mutex operations. I suggest doing this as an exercise.)
So yes, this is guaranteed correct. (The code above spins on its mutexes and the consumer thread pushes nullptrs to the result thread's queue, both of which are inefficient, but for the purposes of discussion here, it works.)
I expect the real question is what happens when one is not using mutexes in strict patterns. That gets into lock-free programming which is quite complicated. There are a variety of memory barriers of varying strength. To use such tools, one must start by having a really solid understanding of the specification of the ordering primitives -- barriers, fences, and the like. Which storage locations do they apply to, what operations do they establish an ordering on, and between which threads is that ordering imposed. Even getting a really good working model for the concept of acquire/release semantics can be a bit tricky. Then one really has to sit down and do a combination of design and correctness proving on the algorithm. Finally the code has to be written to the specification one has decided on. One can then check the code using a variety of tools ranging from formal verification ones (e.g. spin http://spinroot.com/spin/whatispin.html) to execution based ones like clang's thread sanitizer.
Point being that getting lock free code correct requires significantly more rigor than most programming tasks. I often tell people you cannot substitute debugging for design in multithreaded code and this applies even more so to lock free mechanisms. (Many serious programmers consider lock free techniques to be so error prone as to be a terrible idea outside of incredibly narrow uses.)

Your code is written such that you are blocking on producer being complete before starting the consumer, and so on. Join specifically stops the current thread until the thread you're calling has completed its work.
So as your code read, yes, its thread safe.
Does it make sense? Not really. Generally the reason you have consumers/producers with a queue of work to do is you want to do some expensive operations while handling some kind of back pressure. This means the producers and consumers are working at the same time.
If that is your intent, then the answer is no, std::deque, nor any other stl container is thread safe for use in this way. In your example you'd have to wrap locks around all deque accesses and make sure you were removing any item from the queue completely if you're going to unlock it. You've got a bug in your code currently where you do a front() instead of a pop_front(), which means the string is left in the work queue. This would lead to issues where more than one consumer could end up working on that string which is bad news bears.

Is there a facility in boost to allow for write-biased locking?

If I have the following code:
#include <boost/date_time.hpp>
#include <boost/thread.hpp>
boost::shared_mutex g_sharedMutex;
void reader()
{
boost::shared_lock<boost::shared_mutex> lock(g_sharedMutex);
boost::this_thread::sleep(boost::posix_time::seconds(10));
}
void makeReaders()
{
while (1)
{
boost::thread ar(reader);
boost::this_thread::sleep(boost::posix_time::seconds(3));
}
}
boost::thread mr(makeReaders);
boost::this_thread::sleep(boost::posix_time::seconds(5));
boost::unique_lock<boost::shared_mutex> lock(g_sharedMutex);
...
the unique lock will never be acquired, because there are always going to be readers. I want a unique_lock that, when it starts waiting, prevents any new read locks from gaining access to the mutex (called a write-biased or write-preferred lock, based on my wiki searching). Is there a simple way to do this with boost? Or would I need to write my own?

Note that I won't comment on the win32 implementation because it's way more involved and I don't have the time to go through it in detail. That being said, it's interface is the same as the pthread implementation which means that the following answer should be equally valid.
The relevant pieces of the pthread implementation of boost::shared_mutex as of v1.51.0:
void lock_shared()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.exclusive || state.exclusive_waiting_blocked)
{
shared_cond.wait(lk);
}
++state.shared_count;
}
void lock()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.shared_count || state.exclusive)
{
state.exclusive_waiting_blocked=true;
exclusive_cond.wait(lk);
}
state.exclusive=true;
}
The while loop conditions are the most relevant part for you. For the lock_shared function (read lock), notice how the while loop will not terminate as long as there's a thread trying to acquire (state.exclusive_waiting_blocked) or already owns (state.exclusive) the lock. This essentially means that write locks have priority over read locks.
For the lock function (write lock), the while loop will not terminate as long as there's at least one thread that currently owns the read lock (state.shared_count) or another thread owns the write lock (state.exclusive). This essentially gives you the usual mutual exclusion guarantees.
As for deadlocks, well the read lock will always return as long as the write locks are guaranteed to be unlocked once they are acquired. As for the write lock, it's guaranteed to return as long as the read locks and the write locks are always guaranteed to be unlocked once acquired.
In case you're wondering, the state_change mutex is used to ensure that there's no concurrent calls to either of these functions. I'm not going to go through the unlock functions because they're a bit more involved. Feel free to look them over yourself, you have the source after all (boost/thread/pthread/shared_mutex.hpp) :)
All in all, this is pretty much a text book implementation and they've been extensively tested in a wide range of scenarios (libs/thread/test/test_shared_mutex.cpp and massive use across the industry). I wouldn't worry too much as long you use them idiomatically (no recursive locking and always lock using the RAII helpers). If you still don't trust the implementation, then you could write a randomized test that simulates whatever test case you're worried about and let it run overnight on hundreds of thread. That's usually a good way to tease out deadlocks.
Now why would you see that a read lock is acquired after a write lock is requested? Difficult to say without seeing the diagnostic code that you're using. Chances are that the read lock is acquired after your print statement (or whatever you're using) is completed and before state_change lock is acquired in the write thread.

Mutex example / tutorial? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I was trying to understand how mutexes work. Did a lot of Googling but it still left some doubts of how it works because I created my own program in which locking didn't work.
One absolutely non-intuitive syntax of the mutex is pthread_mutex_lock( &mutex1 );, where it looks like the mutex is being locked, when what I really want to lock is some other variable. Does this syntax mean that locking a mutex locks a region of code until the mutex is unlocked? Then how do threads know that the region is locked? [UPDATE: Threads know that the region is locked, by Memory Fencing ]. And isn't such a phenomenon supposed to be called critical section? [UPDATE: Critical section objects are available in Windows only, where the objects are faster than mutexes and are visible only to the thread which implements it. Otherwise, critical section just refers to the area of code protected by a mutex]
What's the simplest possible mutex example program and the simplest possible explanation on the logic of how it works?

Here goes my humble attempt to explain the concept to newbies around the world: (a color coded version on my blog too)
A lot of people run to a lone phone booth (they don't have mobile phones) to talk to their loved ones. The first person to catch the door-handle of the booth, is the one who is allowed to use the phone. He has to keep holding on to the handle of the door as long as he uses the phone, otherwise someone else will catch hold of the handle, throw him out and talk to his wife :) There's no queue system as such. When the person finishes his call, comes out of the booth and leaves the door handle, the next person to get hold of the door handle will be allowed to use the phone.
A thread is : Each person
The mutex is : The door handle
The lock is : The person's hand
The resource is : The phone
Any thread which has to execute some lines of code which should not be modified by other threads at the same time (using the phone to talk to his wife), has to first acquire a lock on a mutex (clutching the door handle of the booth). Only then will a thread be able to run those lines of code (making the phone call).
Once the thread has executed that code, it should release the lock on the mutex so that another thread can acquire a lock on the mutex (other people being able to access the phone booth).
[The concept of having a mutex is a bit absurd when considering real-world exclusive access, but in the programming world I guess there was no other way to let the other threads 'see' that a thread was already executing some lines of code. There are concepts of recursive mutexes etc, but this example was only meant to show you the basic concept. Hope the example gives you a clear picture of the concept.]
With C++11 threading:
#include <iostream>
#include <thread>
#include <mutex>
std::mutex m;//you can use std::lock_guard if you want to be exception safe
int i = 0;
void makeACallFromPhoneBooth()
{
m.lock();//man gets a hold of the phone booth door and locks it. The other men wait outside
//man happily talks to his wife from now....
std::cout << i << " Hello Wife" << std::endl;
i++;//no other thread can access variable i until m.unlock() is called
//...until now, with no interruption from other men
m.unlock();//man lets go of the door handle and unlocks the door
}
int main()
{
//This is the main crowd of people uninterested in making a phone call
//man1 leaves the crowd to go to the phone booth
std::thread man1(makeACallFromPhoneBooth);
//Although man2 appears to start second, there's a good chance he might
//reach the phone booth before man1
std::thread man2(makeACallFromPhoneBooth);
//And hey, man3 also joined the race to the booth
std::thread man3(makeACallFromPhoneBooth);
man1.join();//man1 finished his phone call and joins the crowd
man2.join();//man2 finished his phone call and joins the crowd
man3.join();//man3 finished his phone call and joins the crowd
return 0;
}
Compile and run using g++ -std=c++0x -pthread -o thread thread.cpp;./thread
Instead of explicitly using lock and unlock, you can use brackets as shown here, if you are using a scoped lock for the advantage it provides. Scoped locks have a slight performance overhead though.

While a mutex may be used to solve other problems, the primary reason they exist is to provide mutual exclusion and thereby solve what is known as a race condition. When two (or more) threads or processes are attempting to access the same variable concurrently, we have potential for a race condition. Consider the following code
//somewhere long ago, we have i declared as int
void my_concurrently_called_function()
{
i++;
}
The internals of this function look so simple. It's only one statement. However, a typical pseudo-assembly language equivalent might be:
load i from memory into a register
add 1 to i
store i back into memory
Because the equivalent assembly-language instructions are all required to perform the increment operation on i, we say that incrementing i is a non-atmoic operation. An atomic operation is one that can be completed on the hardware with a gurantee of not being interrupted once the instruction execution has begun. Incrementing i consists of a chain of 3 atomic instructions. In a concurrent system where several threads are calling the function, problems arise when a thread reads or writes at the wrong time. Imagine we have two threads running simultaneoulsy and one calls the function immediately after the other. Let's also say that we have i initialized to 0. Also assume that we have plenty of registers and that the two threads are using completely different registers, so there will be no collisions. The actual timing of these events may be:
thread 1 load 0 into register from memory corresponding to i //register is currently 0
thread 1 add 1 to a register //register is now 1, but not memory is 0
thread 2 load 0 into register from memory corresponding to i
thread 2 add 1 to a register //register is now 1, but not memory is 0
thread 1 write register to memory //memory is now 1
thread 2 write register to memory //memory is now 1
What's happened is that we have two threads incrementing i concurrently, our function gets called twice, but the outcome is inconsistent with that fact. It looks like the function was only called once. This is because the atomicity is "broken" at the machine level, meaning threads can interrupt each other or work together at the wrong times.
We need a mechanism to solve this. We need to impose some ordering to the instructions above. One common mechanism is to block all threads except one. Pthread mutex uses this mechanism.
Any thread which has to execute some lines of code which may unsafely modify shared values by other threads at the same time (using the phone to talk to his wife), should first be made acquire a lock on a mutex. In this way, any thread that requires access to the shared data must pass through the mutex lock. Only then will a thread be able to execute the code. This section of code is called a critical section.
Once the thread has executed the critical section, it should release the lock on the mutex so that another thread can acquire a lock on the mutex.
The concept of having a mutex seems a bit odd when considering humans seeking exclusive access to real, physical objects but when programming, we must be intentional. Concurrent threads and processes don't have the social and cultural upbringing that we do, so we must force them to share data nicely.
So technically speaking, how does a mutex work? Doesn't it suffer from the same race conditions that we mentioned earlier? Isn't pthread_mutex_lock() a bit more complex that a simple increment of a variable?
Technically speaking, we need some hardware support to help us out. The hardware designers give us machine instructions that do more than one thing but are guranteed to be atomic. A classic example of such an instruction is the test-and-set (TAS). When trying to acquire a lock on a resource, we might use the TAS might check to see if a value in memory is 0. If it is, that would be our signal that the resource is in use and we do nothing (or more accurately, we wait by some mechanism. A pthreads mutex will put us into a special queue in the operating system and will notify us when the resource becomes available. Dumber systems may require us to do a tight spin loop, testing the condition over and over). If the value in memory is not 0, the TAS sets the location to something other than 0 without using any other instructions. It's like combining two assembly instructions into 1 to give us atomicity. Thus, testing and changing the value (if changing is appropriate) cannot be interrupted once it has begun. We can build mutexes on top of such an instruction.
Note: some sections may appear similar to an earlier answer. I accepted his invite to edit, he preferred the original way it was, so I'm keeping what I had which is infused with a little bit of his verbiage.

I stumbled upon this post recently and think that it needs an updated solution for the standard library's c++11 mutex (namely std::mutex).
I've pasted some code below (my first steps with a mutex - I learned concurrency on win32 with HANDLE, SetEvent, WaitForMultipleObjects etc).
Since it's my first attempt with std::mutex and friends, I'd love to see comments, suggestions and improvements!
#include <condition_variable>
#include <mutex>
#include <algorithm>
#include <thread>
#include <queue>
#include <chrono>
#include <iostream>
int _tmain(int argc, _TCHAR* argv[])
{
// these vars are shared among the following threads
std::queue<unsigned int> nNumbers;
std::mutex mtxQueue;
std::condition_variable cvQueue;
bool m_bQueueLocked = false;
std::mutex mtxQuit;
std::condition_variable cvQuit;
bool m_bQuit = false;
std::thread thrQuit(
[&]()
{
using namespace std;
this_thread::sleep_for(chrono::seconds(5));
// set event by setting the bool variable to true
// then notifying via the condition variable
m_bQuit = true;
cvQuit.notify_all();
}
);
std::thread thrProducer(
[&]()
{
using namespace std;
int nNum = 13;
unique_lock<mutex> lock( mtxQuit );
while ( ! m_bQuit )
{
while( cvQuit.wait_for( lock, chrono::milliseconds(75) ) == cv_status::timeout )
{
nNum = nNum + 13 / 2;
unique_lock<mutex> qLock(mtxQueue);
cout << "Produced: " << nNum << "\n";
nNumbers.push( nNum );
}
}
}
);
std::thread thrConsumer(
[&]()
{
using namespace std;
unique_lock<mutex> lock(mtxQuit);
while( cvQuit.wait_for(lock, chrono::milliseconds(150)) == cv_status::timeout )
{
unique_lock<mutex> qLock(mtxQueue);
if( nNumbers.size() > 0 )
{
cout << "Consumed: " << nNumbers.front() << "\n";
nNumbers.pop();
}
}
}
);
thrQuit.join();
thrProducer.join();
thrConsumer.join();
return 0;
}

For those looking for the shortex mutex example:
#include <mutex>
int main() {
std::mutex m;
m.lock();
// do thread-safe stuff
m.unlock();
}

The function pthread_mutex_lock() either acquires the mutex for the calling thread or blocks the thread until the mutex can be acquired. The related pthread_mutex_unlock() releases the mutex.
Think of the mutex as a queue; every thread that attempts to acquire the mutex will be placed on the end of the queue. When a thread releases the mutex, the next thread in the queue comes off and is now running.
A critical section refers to a region of code where non-determinism is possible. Often this because multiple threads are attempting to access a shared variable. The critical section is not safe until some sort of synchronization is in place. A mutex lock is one form of synchronization.

You are supposed to check the mutex variable before using the area protected by the mutex. So your pthread_mutex_lock() could (depending on implementation) wait until mutex1 is released or return a value indicating that the lock could not be obtained if someone else has already locked it.
Mutex is really just a simplified semaphore. If you read about them and understand them, you understand mutexes. There are several questions regarding mutexes and semaphores in SO. Difference between binary semaphore and mutex, When should we use mutex and when should we use semaphore and so on. The toilet example in the first link is about as good an example as one can think of. All code does is to check if the key is available and if it is, reserves it. Notice that you don't really reserve the toilet itself, but the key.

SEMAPHORE EXAMPLE ::
sem_t m;
sem_init(&m, 0, 0); // initialize semaphore to 0
sem_wait(&m);
// critical section here
sem_post(&m);
Reference : http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2008/Notes/threads-semaphores.txt

C++0x thread interruption

According to the C++0x final draft, there's no way to request a thread to terminate. That said, if required we need to implement a do-it-yourself solution.
On the other hand boost::thread provides a mechanism to interrupt a thread in a safe manner.
In your opinion, what's the best solution? Designing your own cooperative 'interruption mechanism' or going native?

All the language specification says that the support isn't built into the language.
boost::thread::interrupt needs some support from the thread function, too:
When the interrupted thread next executes one of the specified interruption points (or if it is currently blocked whilst executing one)
i.e. when the thread function doesn't give the caller a chance to interrupt, you are still stuck.
I'm not sure what you mean with "going native" - there is no native support, unless you are spellbound to boost:threads.
Still, I'd use an explicit mechanism. You have to think about having enough interruption points anyway, why not make them explicit? The extra code is usually marginal in my experience, though you may need to change some waits from single-object to multiple-objects, which - depending on your library - may look uglier.
One could also pull the "don't use exceptions for control flow", but compared to messing around with threads, this is just a guideline.

Using native handle to cancel a thread is a bad option in C++ as you need to destroy all the stack allocated objects. This was the main reason they don't included a cancel operation.
Boost.Thread provides an interrupt mechanism, that needs to pool on any waiting primitive. As this can be expensive as a general mechanism, the standard has not included it.
You will need to implement it by yourself. See my answer here to a similar question on how to implement this by yourself. To complete the solution an interruption should be throw when interrupted is true and the thread should catch this interruption and finish.

Here is my humble implementation of a thread canceller (for C++0x).
I hope it will be useful.
// Class cancellation_point
#include <mutex>
#include <condition_variable>
struct cancelled_error {};
class cancellation_point
{
public:
cancellation_point(): stop_(false) {}
void cancel() {
std::unique_lock<std::mutex> lock(mutex_);
stop_ = true;
cond_.notify_all();
}
template <typename P>
void wait(const P& period) {
std::unique_lock<std::mutex> lock(mutex_);
if (stop_ || cond_.wait_for(lock, period) == std::cv_status::no_timeout) {
stop_ = false;
throw cancelled_error();
}
}
private:
bool stop_;
std::mutex mutex_;
std::condition_variable cond_;
};
// Usage example
#include <thread>
#include <iostream>
class ThreadExample
{
public:
void start() {
thread_ = std::unique_ptr<std::thread>(
new std::thread(std::bind(&ThreadExample::run, this)));
}
void stop() {
cpoint_.cancel();
thread_->join();
}
private:
void run() {
std::cout << "thread started\n";
try {
while (true) {
cpoint_.wait(std::chrono::seconds(1));
}
} catch (const cancelled_error&) {
std::cout << "thread cancelled\n";
}
}
std::unique_ptr<std::thread> thread_;
cancellation_point cpoint_;
};
int main() {
ThreadExample ex;
ex.start();
ex.stop();
return 0;
}

It is unsafe to terminate a thread preemptively because the state of the entire process becomes indeterminate after that point. The thread might have acquired a critical section prior to being terminated. That critical section will now never be released. The heap could become permanently locked, and so on.
The boost::thread::interrupt solution works by asking nicely. It will only interrupt a thread doing something thats interruptible, like waiting on a Boost.Thread condition variable, or if the thread does one of these things after interrupt is called. Even then, the thread isn't unceremoniously put through the meat grinder as, say, Win32's TerminateThread function does, it simply induces an exception, which, if you've been a well-behaved coder and used RAII everywhere, will clean up after itself and gracefully exit the thread.

Implementing a do-it-yourself solution makes the most sense, and it really should not be that hard to do. You will need a shared variable that you read/write synchronously, indicating whether the thread is being asked to terminate, and your thread periodically reads from this variable when it is in a state where it can safely be interrupted. When you want to interrupt a thread, you simply write synchronously to this variable, and then you join the thread. Assuming it cooperates appropriately, it should notice that that the variable has been written and shut down, resulting in the join function no longer blocking.
If you were to go native, you would not gain anything by it; you would simply throw out all the benefits of a standard and cross-platform OOP threading mechanism. In order for your code to be correct, the thread would need to shut down cooperatively, which implies the communication described above.

Its unsafe to terminate a thread, since you would have no control over the state of any data-structures is was working on at that moment.
If you want to interrupt a running thread, you have to implement your own mechanism. IMHO if you need that, your design is not prepared for multiple threads.
If you just want to wait for a thread to finish, use join() or a future.

My implementation of threads uses the pimpl idiom, and in the Impl class I have one version for each OS I support and also one that uses boost, so I can decide which one to use when building the project.
I decided to make two classes: one is Thread, which has only the basic, OS-provided, services; and the other is SafeThread, which inherits from Thread and has method for collaborative interruption.
Thread has a terminate() method that does an intrusive termination. It is a virtual method which is overloaded in SafeThread, where it signals an event object. There's a (static) yeld() method which the running thread should call from time to time; this methods checks if the event object is signaled and, if yes, throws an exception caught at the caller of the thread entry point, thereby terminating the thread. When it does so it signals a second event object so the caller of terminate() can know that the thread was safely stopped.
For cases in which there's a risk of deadlock, SafeThread::terminate() can accept a timeout parameter. If the timeout expires, it calls Thread::terminate(), thus killing intrusively the thread. This is a last-resource when you have something you can't control (like a third-party API) or in situations in which a deadlock does more damage than resource leaks and the like.
Hope this'll be useful for your decision and will give you a clear enough picture about my design choices. If not, I can post code fragments to clarify if you want.

I agree with this decision. For example, .NET allows to abort any worker thread, and I never use this feature and don't recommend to do this to any professional programmer. I want to decide myself, when a worker thread may be interrupted, and what is the way to do this. It is different for hardware, I/O, UI and other threads. If thread may be stopped at any place, this may cause undefined program behavior with resource management, transactions etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js