winapi: get callback when thread ends - c++

I have some c++ library code that creates some book-keeping data per thread that accesses it (based on thread-id). I would like to cleanup that data when a thread ends. Is there a way (if not portable, then using win-api) to get notified when a thread ends?
// simplified example:
std::mutex mutex_;
std::unordered_map<std::thread::id, int> thread_accesses_;
void LibFunction() {
std::thread::id thread_id = std::this_thread::get_id();
mutex_.lock();
std::unordered_map<std::thread::id, int>::const_iterator it = thread_accesses_.find(thread_id);
if(it == thread_accesses_.end()) {
thread_accesses_[thread_id] = 0;
} else {
thread_accesses_[thread_id]++;
}
mutex_.unlock();
}

Thread-local storage is both C++ standard way and platform way.
C++ has thread_local keyword to declare thread-local variable. Then destructor of that variable is called for each thread for which it was constructed. Thread local variable is constructed for at least all threads that access the variable, and possibly for other threads.
Windows has thread-local storage as system mechanism. thread_local is implemented via this mechanism.
It is possible to have thread exit callbacks in Windows by other means:
having thread-local data and TLS callbacks in module by other means
Using DllMain callbacks
Passing FlsCallback in FlsAlloc, fiber local storage is something superior to thread local storage, and in absence of fibers it behaves exactly like thread local storage
If cannot use thread_local, but want something simple and portable, consider also boost::thread_specific_ptr.

Related

c++ thread local counter implement

I wanna implement a high performance counter in multi-thread process, like this, each thread has a thread local counter named "t_counter" to count query(incr 1/query) and in "timer thread" there is a counter named "global_counter", what I want is each second, global_counter will get each t_counter(s) and add them to global_counter, but I dont know how to get each t_counter value in "timer thread". additional, which section will thread local value lay in main memory ? .data or heap or other? how to dynamic allocate memory size(there maybe 10 thread or 100 thread) ? and does x86-64 use segment register store such value?
Starting with your second question, you can find all the specifications here.
Summarizing, thread local variables are defined in .tdata / .tbss. Those are somewhat similar to .data, however accessing those is different. These sections are replicated per thread. The actual variable offset is computed at the runtime.
A variable is identified by an offset in .tdata. Speaking of x86_64 it will use the FS segment register to find the TCB (Thread control block), using the data structures stored there it will locate the thread local storage where the variable is located. Note that all allocations are done lazily if possible.
Now, regarding your first question - I am not aware of a way to just list all the thread local variables from another thread, and I doubt it is available.
However, a thread can take a pointer to thread variable, and pass it to another thread. So what you probably need is some registration mechanism.
Each new thread will register itself to some main store, then unregister on termination. Registration and deregistration are on your responsibility.
Schematically, it would look like this:
thread_local int counter = 0;
std::map<std::thread::id, int *> regs;
void register() {
// Take some lock here or other synchronization, maybe RW lock
regs[std::this_thread::get_id()] = &counter;
}
void unregister() {
// Again, some lock or other synchronization
regs.erase(std::this_thread::get_id());
}
void thread_main() {
register();
counter++;
unregister();
}
void get_sum() {
// Again, some lock, maybe only read lock
return std::accumulate(regs.begin(), regs.end(), 0,
[](int previous, const auto& element)
{ return previous + *element.second; });
}

std::lock_guard example, explanation on why it works

I've reached a point in my project that requires communication between threads on resources that very well may be written to, so synchronization is a must. However I don't really understand synchronization at anything other than the basic level.
Consider the last example in this link: http://www.bogotobogo.com/cplusplus/C11/7_C11_Thread_Sharing_Memory.php
#include <iostream>
#include <thread>
#include <list>
#include <algorithm>
#include <mutex>
using namespace std;
// a global variable
std::list<int>myList;
// a global instance of std::mutex to protect global variable
std::mutex myMutex;
void addToList(int max, int interval)
{
// the access to this function is mutually exclusive
std::lock_guard<std::mutex> guard(myMutex);
for (int i = 0; i < max; i++) {
if( (i % interval) == 0) myList.push_back(i);
}
}
void printList()
{
// the access to this function is mutually exclusive
std::lock_guard<std::mutex> guard(myMutex);
for (auto itr = myList.begin(), end_itr = myList.end(); itr != end_itr; ++itr ) {
cout << *itr << ",";
}
}
int main()
{
int max = 100;
std::thread t1(addToList, max, 1);
std::thread t2(addToList, max, 10);
std::thread t3(printList);
t1.join();
t2.join();
t3.join();
return 0;
}
The example demonstrates how three threads, two writers and one reader, accesses a common resource(list).
Two global functions are used: one which is used by the two writer threads, and one being used by the reader thread. Both functions use a lock_guard to lock down the same resource, the list.
Now here is what I just can't wrap my head around: The reader uses a lock in a different scope than the two writer threads, yet still locks down the same resource. How can this work? My limited understanding of mutexes lends itself well to the writer function, there you got two threads using the exact same function. I can understand that, a check is made right as you are about to enter the protected area, and if someone else is already inside, you wait.
But when the scope is different? This would indicate that there is some sort of mechanism more powerful than the process itself, some sort of runtime environment blocking execution of the "late" thread. But I thought there were no such things in c++. So I am at a loss.
What exactly goes on under the hood here?
Let’s have a look at the relevant line:
std::lock_guard<std::mutex> guard(myMutex);
Notice that the lock_guard references the global mutex myMutex. That is, the same mutex for all three threads. What lock_guard does is essentially this:
Upon construction, it locks myMutex and keeps a reference to it.
Upon destruction (i.e. when the guard's scope is left), it unlocks myMutex.
The mutex is always the same one, it has nothing to do with the scope. The point of lock_guard is just to make locking and unlocking the mutex easier for you. For example, if you manually lock/unlock, but your function throws an exception somewhere in the middle, it will never reach the unlock statement. So, doing it the manual way you have to make sure that the mutex is always unlocked. On the other hand, the lock_guard object gets destroyed automatically whenever the function is exited – regardless how it is exited.
myMutex is global, which is what is used to protect myList. guard(myMutex) simply engages the lock and the exit from the block causes its destruction, dis-engaging the lock. guard is just a convenient way to engage and dis-engage the lock.
With that out of the way, mutex does not protect any data. It just provides a way to protect data. It is the design pattern that protects data. So if I write my own function to modify the list as below, the mutex cannot protect it.
void addToListUnsafe(int max, int interval)
{
for (int i = 0; i < max; i++) {
if( (i % interval) == 0) myList.push_back(i);
}
}
The lock only works if all pieces of code that need to access the data engage the lock before accessing and disengage after they are done. This design-pattern of engaging and dis-engaging the lock before and after every access is what protects the data (myList in your case)
Now you would wonder, why use mutex at all, and why not, say, a bool. And yes you can, but you will have to make sure that the bool variable will exhibit certain characteristics including but not limited to the below list.
Not be cached (volatile) across multiple threads.
Read and write will be atomic operation.
Your lock can handle situation where there are multiple execution pipelines (logical cores, etc).
There are different synchronization mechanisms that provide "better locking" (across processes versus across threads, multiple processor versus, single processor, etc) at a cost of "slower performance", so you should always choose a locking mechanism which is just about enough for your situation.
Just to add onto what others here have said...
There is an idea in C++ called Resource Acquisition Is Initialization (RAII) which is this idea of binding resources to the lifetime of objects:
Resource Acquisition Is Initialization or RAII, is a C++ programming technique which binds the life cycle of a resource that must be acquired before use (allocated heap memory, thread of execution, open socket, open file, locked mutex, disk space, database connection—anything that exists in limited supply) to the lifetime of an object.
C++ RAII Info
The use of a std::lock_guard<std::mutex> class follows the RAII idea.
Why is this useful?
Consider a case where you don't use a std::lock_guard:
std::mutex m; // global mutex
void oops() {
m.lock();
doSomething();
m.unlock();
}
in this case, a global mutex is used and is locked before the call to doSomething(). Then once doSomething() is complete the mutex is unlocked.
One problem here is what happens if there is an exception? Now you run the risk of never reaching the m.unlock() line which releases the mutex to other threads.
So you need to cover the case where you run into an exception:
std::mutex m; // global mutex
void oops() {
try {
m.lock();
doSomething();
m.unlock();
} catch(...) {
m.unlock(); // now exception path is covered
// throw ...
}
}
This works but is ugly, verbose, and inconvenient.
Now lets write our own simple lock guard.
class lock_guard {
private:
std::mutex& m;
public:
lock_guard(std::mutex& m_):(m(m_)){ m.lock(); } // lock on construction
~lock_guard() { t.unlock(); }} // unlock on deconstruction
}
When the lock_guard object is destroyed, it will ensure that the mutex is unlocked.
Now we can use this lock_guard to handle the case from before in a better/cleaner way:
std::mutex m; // global mutex
void ok() {
lock_guard lk(m); // our simple lock guard, protects against exception case
doSomething();
} // when scope is exited our lock guard object is destroyed and the mutex unlocked
This is the same idea behind std::lock_guard.
Again this approach is used with many different types of resources which you can read more about by following the link on RAII.
This is precisely what a lock does. When a thread takes the lock, regardless of where in the code it does so, it must wait its turn if another thread holds the lock. When a thread releases a lock, regardless of where in the code it does so, another thread may acquire that lock.
Locks protect data, not code. They do it by ensuring all code that accesses the protected data does so while it holds the lock, excluding other threads from any code that might access that same data.

Setting all TLS (thread local storage) variables to a new, single value in C++

I have a class Foo with the following thread-specific static member:
__declspec(thread) static bool s_IsAllAboutThatBass;
In the implementation file it is initialized like so:
__declspec(thread) bool Foo::s_IsAllAboutThatBass = true;
So far so good. Now, any thread can flip this bool willy nilly as they deem fit. Then the problem: at some point I want each thread to reset that bool to its initial true value.
How can I slam all instances of the TLS to true from a central thread?
I've thought of ways I could do this with synchronization primitives I know about, like critical sections, read/write sections, or events, but nothing fits the bill. In my real use cases I am unable to block any of the other threads for any significant length of time.
Any help is appreciated. Thank you!
Edit: Plan A
One idea is to use a generation token, or cookie that is read by all threads and written to by the central thread. Each thread can then have a TLS for the last generation viewed by that thread when grabbing s_isAllAboutThatBass via some accessor. When the thread local cookie differs from the shared cookie, we increment the thread local one and update s_isAllAboutThatBass to true.
Here is a light weighted implementation of "Plan A" with C++11 Standard atomic variable and thread_local-specifier. (If your compiler doesn't support them, please replace to vendor specific facilities.)
#include <atomic>
struct Foo {
static std::atomic<unsigned> s_TokenGeneration;
static thread_local unsigned s_LocalToken;
static thread_local bool s_LocalState;
// for central thread
void signalResetIsAllAboutThatBass() {
++s_TokenGeneration;
}
// accessor for other threads
void setIsAllAboutThatBass(bool b) {
unsigned currToken = s_TokenGeneration;
s_LocalToken = currToken;
s_LocalState = b;
}
bool getIsAllAboutThatBass() const {
unsigned currToken = s_TokenGeneration;
if (s_LocalToken < currToken) {
// reset thread-local token & state
s_LocalToken = currToken;
s_LocalState = true;
}
return s_LocalState;
}
};
std::atomic<unsigned> Foo::s_TokenGeneration;
thread_local unsigned Foo::s_LocalToken = 0u;
thread_local bool Foo::s_LocalState = true;
The simplest answer is: you can't. The reason that it's called thread local storage is because only its thread can access it. Which, by definition, means that some other "central thread" can't get to it. That's what it's all about, by definition.
Now, depending on how your hardware and compiler platform implements TLS, there might be a trick around it, if your implemention of TLS works by mapping TLS variables to different virtual memory addresses. Typically, what happens is that one CPU register is thread-specific, it's set to point to different memory addresses, and all TLS variables are accessed as relative addresses.
If that is the case, you could, perhaps, derive some thread-safe mechanism by which each thread takes a pointer to its TLS variable, and puts it into a non-TLS container, that your "central thread" can get to.
And, of course, you must keep all of that in sync with your threads, and clean things up after each thread terminates.
You'll have to figure out whether this is the case on your platform with a trivial test: declare a TLS variable, then compare its pointer address in two different threads. If it's different, you might be able to work around it, in this fashion. Technically, this kind of pointer comparison is non-portable, and implementation defined, but by this time you're already far into implemention-specific behavior.
But if the addresses are the same, it means that your implementation uses virtual memory addressing to implement TLS. Only the executing thread has access to its TLS variable, period, and there is no practical means by which any "central thread" could look at other threads' TLS variables. It's enforced by your operating system kernel. The "central thread" must cooperate which each thread, and make arrangements to access the thread's TLS variables using typical means of interthread communications.
The cookie approach would work fine, and you don't need to use a TLS slot to implement it, just a local variable inside your thread procedure. To handle the case where the cookie changes value between the time that the thread is created and the time that it starts running (there is a small delay), you would have to pass the current cookie value as an input parameter for the thread creation, then your thread procedure can initialize its local variable to that value before it starts checking the live cookie for changes.
intptr_t g_cookie = 1;
pthread_rwlock_t g_lock;
void* thread_proc(void *arg)
{
intptr_t cookie = (intptr_t)arg;
while (keepRunningUntilSomeCondition)
{
pthread_rwlock_rdlock(&g_lock);
if (cookie != g_cookie)
{
cookie = g_cookie;
s_IsAllAboutThatBass = true;
}
pthread_rwlock_unlock(&g_lock);
//...
}
pthread_exit(NULL);
}
void createThread()
{
...
pthread_t thread;
pthread_create(&thread, NULL, &thread_proc, (void*)g_cookie);
...
}
void signalThreads()
{
pthread_rwlock_wrlock(&g_lock);
++g_cookie;
pthread_rwlock_unlock(&g_lock);
}
int main()
{
pthread_rwlock_init(&g_lock, NULL);
// use createThread() and signalThreads() as needed...
pthread_rwlock_destroy(&g_lock);
return 0;
}

Thread safe container

There is some exemplary class of container in pseudo code:
class Container
{
public:
Container(){}
~Container(){}
void add(data new)
{
// addition of data
}
data get(size_t which)
{
// returning some data
}
void remove(size_t which)
{
// delete specified object
}
private:
data d;
};
How this container can be made thread safe? I heard about mutexes - where these mutexes should be placed? Should mutex be static for a class or maybe in global scope? What is good library for this task in C++?
First of all mutexes should not be static for a class as long as you going to use more than one instance. There is many cases where you should or shouldn't use use them. So without seeing your code it's hard to say. Just remember, they are used to synchronise access to shared data. So it's wise to place them inside methods that modify or rely on object's state. In your case I would use one mutex to protect whole object and lock all three methods. Like:
class Container
{
public:
Container(){}
~Container(){}
void add(data new)
{
lock_guard<Mutex> lock(mutex);
// addition of data
}
data get(size_t which)
{
lock_guard<Mutex> lock(mutex);
// getting copy of value
// return that value
}
void remove(size_t which)
{
lock_guard<Mutex> lock(mutex);
// delete specified object
}
private:
data d;
Mutex mutex;
};
Intel Thread Building Blocks (TBB) provides a bunch of thread-safe container implementations for C++. It has been open sourced, you can download it from: http://threadingbuildingblocks.org/ver.php?fid=174 .
First: sharing mutable state between threads is hard. You should be using a library that has been audited and debugged.
Now that it is said, there are two different functional issue:
you want a container to provide safe atomic operations
you want a container to provide safe multiple operations
The idea of multiple operations is that multiple accesses to the same container must be executed successively, under the control of a single entity. They require the caller to "hold" the mutex for the duration of the transaction so that only it changes the state.
1. Atomic operations
This one appears simple:
add a mutex to the object
at the start of each method grab a mutex with a RAII lock
Unfortunately it's also plain wrong.
The issue is re-entrancy. It is likely that some methods will call other methods on the same object. If those once again attempt to grab the mutex, you get a dead lock.
It is possible to use re-entrant mutexes. They are a bit slower, but allow the same thread to lock a given mutex as much as it wants. The number of unlocks should match the number of locks, so once again, RAII.
Another approach is to use dispatching methods:
class C {
public:
void foo() { Lock lock(_mutex); foo_impl(); }]
private:
void foo_impl() { /* do something */; foo_impl(); }
};
The public methods are simple forwarders to private work-methods and simply lock. Then one just have to ensure that private methods never take the mutex...
Of course there are risks of accidentally calling a locking method from a work-method, in which case you deadlock. Read on to avoid this ;)
2. Multiple operations
The only way to achieve this is to have the caller hold the mutex.
The general method is simple:
add a mutex to the container
provide a handle on this method
cross your fingers that the caller will never forget to hold the mutex while accessing the class
I personally prefer a much saner approach.
First, I create a "bundle of data", which simply represents the class data (+ a mutex), and then I provide a Proxy, in charge of grabbing the mutex. The data is locked so that the proxy only may access the state.
class ContainerData {
protected:
friend class ContainerProxy;
Mutex _mutex;
void foo();
void bar();
private:
// some data
};
class ContainerProxy {
public:
ContainerProxy(ContainerData& data): _data(data), _lock(data._mutex) {}
void foo() { data.foo(); }
void bar() { foo(); data.bar(); }
};
Note that it is perfectly safe for the Proxy to call its own methods. The mutex will be released automatically by the destructor.
The mutex can still be reentrant if multiple Proxies are desired. But really, when multiple proxies are involved, it generally turns into a mess. In debug mode, it's also possible to add a "check" that the mutex is not already held by this thread (and assert if it is).
3. Reminder
Using locks is error-prone. Deadlocks are a common cause of error and occur as soon as you have two mutexes (or one and re-entrancy). When possible, prefer using higher level alternatives.
Add mutex as an instance variable of class. Initialize it in constructor, and lock it at the very begining of every method, including destructor, and unlock at the end of method. Adding global mutex for all instances of class (static member or just in gloabl scope) may be a performance penalty.
The is also a very nice collection of lock-free containers (including maps) by Max Khiszinsky
LibCDS1 Concurrent Data Structures
Here is the documentation page:
http://libcds.sourceforge.net/doc/index.html
It can be kind of intimidating to get started, because it is fully generic and requires you register a chosen garbage collection strategy and initialize that. Of course, the threading library is configurable and you need to initialize that as well :)
See the following links for some getting started info:
initialization of CDS and the threading manager
http://sourceforge.net/projects/libcds/forums/forum/1034512/topic/4600301/
the unit tests ((cd build && ./build.sh ----debug-test for debug build)
Here is base template for 'main':
#include <cds/threading/model.h> // threading manager
#include <cds/gc/hzp/hzp.h> // Hazard Pointer GC
int main()
{
// Initialize \p CDS library
cds::Initialize();
// Initialize Garbage collector(s) that you use
cds::gc::hzp::GarbageCollector::Construct();
// Attach main thread
// Note: it is needed if main thread can access to libcds containers
cds::threading::Manager::attachThread();
// Do some useful work
...
// Finish main thread - detaches internal control structures
cds::threading::Manager::detachThread();
// Terminate GCs
cds::gc::hzp::GarbageCollector::Destruct();
// Terminate \p CDS library
cds::Terminate();
}
Don't forget to attach any additional threads you are using:
#include <cds/threading/model.h>
int myThreadFunc(void *)
{
// initialize libcds thread control structures
cds::threading::Manager::attachThread();
// Now, you can work with GCs and libcds containers
....
// Finish working thread
cds::threading::Manager::detachThread();
}
1 (not to be confuse with Google's compact datastructures library)

Threading C++ lock

I need some synchronous mechanism for thread. I am wondering the implementation below which one is a better way?
classA{
public:
int sharedResourceA;
pthread_mutex_t mutex1;
functionA();
int nonSharedResources;
}
classA::functionA(){
pthread_mutex_lock( &mutex1 );
use sharedResourceA;
pthread_mutex_unlock( &mutex1 );
}
classA objA;
pthread_mutex_lock(&objA.mutex1) //use lock because another thread can call obj.functionA
use objA.sharedResources;
pthread_mutex_unlock(&objA.mutex1)
use objA.nonSharedResources = blah //without lock because is non shared
OR I shouldn't create a lock at classA, instead I create a lock at the application. Eg:
classA objA;
pthread_mutex_t mutex2;
pthread_mutex_lock(mutex2) //use lock because another thread can call obj.functionA
use objA.sharedResources;
pthread_mutex_unlock(mutex2)
pthread_mutex_lock(mutex2) //use lock because another thread can call obj.functionA
functionA();
pthread_mutex_unlock(mutex2)
use objA.nonSharedResources = blah //without lock because is non shared
First - the idiomatic way for doing locks in c++ is to create a lock class that uses RAII.
Then you can go
Lock l(mutex1);
// do stuff under mutex1 lock;
// Lock is freed at end of scope
(I bet boost has a lock, we made our own)
Second. (the scope question). If class A uses shared resources internally then it should lock them internally. Otherwise
how does a caller know to do it
how can you be sure they did it
what if you change the implementation
The application level lock should be used when the caller is the one using the shared resources and is composing something larger that uses classA, funcX and file W. Note that classA may still have its own internal lock in this case
If functionA uses some shared resources, it should ensure that it's accessing them in correct way - i.e. ensure thread safety. That is a vote for the first option you presented.
There are more efficient ways to use mutexes: see boost::recursive_mutex and boost::recursive_mutex::scoped_lock. Using this you can ensure that even if something in critical section throws, your mutex will be unlocked. For example:
using namespace boost;
struct C
{
function f ()
{
//non critical section
//...
//critical section
{
//acquire the mutex
recursive_mutex::scoped_lock lock(mutex);
//do whatever you want. Can throw if it needs to:)
}//exiting the scope causes the mutex to be released
//non critical section again
}
private:
recursive_mutex mutex;
}
I would say the first one is better because if you need to instantiate ClassA more than once, you'll need do create as many global locks for the second solution.
It also respect object encapsulation if you do it inside the class and hides usage of the protected resource behind method. Also, if the shared resource ever becomes not shared, you have the class methods to change in the code instead of refactoring each and every usage of the resource if you use global locks.