c++ thread local counter implement - c++

I wanna implement a high performance counter in multi-thread process, like this, each thread has a thread local counter named "t_counter" to count query(incr 1/query) and in "timer thread" there is a counter named "global_counter", what I want is each second, global_counter will get each t_counter(s) and add them to global_counter, but I dont know how to get each t_counter value in "timer thread". additional, which section will thread local value lay in main memory ? .data or heap or other? how to dynamic allocate memory size(there maybe 10 thread or 100 thread) ? and does x86-64 use segment register store such value?

Starting with your second question, you can find all the specifications here.
Summarizing, thread local variables are defined in .tdata / .tbss. Those are somewhat similar to .data, however accessing those is different. These sections are replicated per thread. The actual variable offset is computed at the runtime.
A variable is identified by an offset in .tdata. Speaking of x86_64 it will use the FS segment register to find the TCB (Thread control block), using the data structures stored there it will locate the thread local storage where the variable is located. Note that all allocations are done lazily if possible.
Now, regarding your first question - I am not aware of a way to just list all the thread local variables from another thread, and I doubt it is available.
However, a thread can take a pointer to thread variable, and pass it to another thread. So what you probably need is some registration mechanism.
Each new thread will register itself to some main store, then unregister on termination. Registration and deregistration are on your responsibility.
Schematically, it would look like this:
thread_local int counter = 0;
std::map<std::thread::id, int *> regs;
void register() {
// Take some lock here or other synchronization, maybe RW lock
regs[std::this_thread::get_id()] = &counter;
}
void unregister() {
// Again, some lock or other synchronization
regs.erase(std::this_thread::get_id());
}
void thread_main() {
register();
counter++;
unregister();
}
void get_sum() {
// Again, some lock, maybe only read lock
return std::accumulate(regs.begin(), regs.end(), 0,
[](int previous, const auto& element)
{ return previous + *element.second; });
}

Related

winapi: get callback when thread ends

I have some c++ library code that creates some book-keeping data per thread that accesses it (based on thread-id). I would like to cleanup that data when a thread ends. Is there a way (if not portable, then using win-api) to get notified when a thread ends?
// simplified example:
std::mutex mutex_;
std::unordered_map<std::thread::id, int> thread_accesses_;
void LibFunction() {
std::thread::id thread_id = std::this_thread::get_id();
mutex_.lock();
std::unordered_map<std::thread::id, int>::const_iterator it = thread_accesses_.find(thread_id);
if(it == thread_accesses_.end()) {
thread_accesses_[thread_id] = 0;
} else {
thread_accesses_[thread_id]++;
}
mutex_.unlock();
}
Thread-local storage is both C++ standard way and platform way.
C++ has thread_local keyword to declare thread-local variable. Then destructor of that variable is called for each thread for which it was constructed. Thread local variable is constructed for at least all threads that access the variable, and possibly for other threads.
Windows has thread-local storage as system mechanism. thread_local is implemented via this mechanism.
It is possible to have thread exit callbacks in Windows by other means:
having thread-local data and TLS callbacks in module by other means
Using DllMain callbacks
Passing FlsCallback in FlsAlloc, fiber local storage is something superior to thread local storage, and in absence of fibers it behaves exactly like thread local storage
If cannot use thread_local, but want something simple and portable, consider also boost::thread_specific_ptr.

How does access other thread stack variable work in C++?

For example, I have:
int main()
{
int i = 0;
std::thread t([&] {
for (int c = 0; c < 100; ++c)
++i;
});
t.join();
return 0;
}
The thread t change the variable i value.
I think, that when OS changes current thread it must save an old thread stack and copy a new thread stack.
How does operation system provide a right access to the i?
Does it exists any explanation, how it works on an operating system level?
Does it more productive if I will use something like:
int main()
{
int* i = new int;
std::thread t([&] {
for (int c = 0; c < 100; ++c)
++(*i);
});
t.join();
return 0;
}
There are two separate things at play in your example code: capture of local variables to a lambda function and how threads and their stacks work.
Capture of local variables when a lambda function is created works the same way regardless of whether the lambda is in the same thread or a different thread. Basically references to the variables are passed to the lambda.
See How are C++11 lambdas represented and passed? for more details.
Threads, as commented by Margaret Bloom, share the address space of a process. They gave access to read and modify the same memory (including e.g. global variables). While each thread has a different stack area allocated to it, the stacks are all in the address space of the process so all threads can access the stack area of the other threads. So if a thread has a pointer or a reference to a variable in another threads stack, it can read and modify that.
Adding these 2 things together makes your example code work.
The first version of your code is probably slightly more efficient because there is one less level of indirection.

Access pthread shared std:map without data race

My scenario is to have a main thread and tens of worker threads. Worker threads will process incoming messages from different ports.
What I want to do is to have main and worker threads share a same map, the worker threads save data into map (in different bucket). And the main thread grep the map content periodically.
The code goes like:
struct cStruct
{
std::map<string::string> map1;
pthread_mutex_t mutex1;
pthread_mutex_t mutex2;
};
int main(){
struct cStruct cStruct1;
while (condition){
pthread_t th;
int th_rc=pthread_create(&th,NULL,&task,(void *) &cStruct1);
}
}
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
}
pthread_mutex_lock(&cs.mutex1);
// add a new bucket to the map
cs.map1(thread_identifier)=processed_value;
pthread_mutex_unlock(&cs.mutex1);
}
void* main_thread_task(void* arg){
sleep(sleep_time);
main_thread_work = true;
pthread_mutex_lock(&cs.mutex1);
// main_thread reads the std::map
main_thread_work = false;
pthread_cond_broadcast(&count_cond, &cs.mutex1)
pthread_mutex_unlock(&cs.mutex1);
}
My questions are:
For map size change, I should use lock to protect the map.
But for map with certain key update, can I let different threads modify the map concurrently? (assume no two identical buckets of map will be accessed at same time)
For the main thread greps the map, I thought of use conditional wait to hold all the worker threads while main thread is grepping the map content, and do a pthread_cond_broadcast to wake then up. The problem is that if a worker thread is updating map while main starts to work, there will be data race.
Please share some ideas to help me improve my design.
Edit 1:
Add main_thread_task().
The thing I want to avoid is worker thread arriving pthread_cond_wait "after" pthread_cond_broadcast and the logic goes wrong.
So I false the main_thread_work before main thread broadcasts workers thread.
while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
}
pthread_mutex_lock(&cs.mutex1);
This clearly can't be right. You can't check main_thread_work unless you hold the lock that protects it. How can the call to pthread_cond_wait release a lock it doesn't hold?!
This should be something like:
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
// Acquire the lock so we can check shared state
pthread_mutex_lock(&cs.mutex1);
// Wait until the shared state is what we need it to be
while (main_thread_work)
pthread_cond_wait(&count_cond, &cs.mutex1)
// Do whatever it is we're supposed to do when the
// shared state is in this state
cs.map1(thread_identifier)=processed_value;
// release the lock
pthread_mutex_unlock(&cs.mutex1);
}
You should use mutex locking mechanism on each access to the map (in your case) and not only on adding a new 'bucket'. In case T1 tries to write some value to the map while T2 inserts a new bucket, the pointer/iterator which is used by T1 becomes invalid.
Regarding the pthread_cond_wait. It may do the job in case the only thing that the other threads do is just modifying the map. If they perform other calculations or process some non shared data, it is better to use the same mutex just to protect access to the map and let other threads do their job which may be at that point not related to the shared map.

Setting all TLS (thread local storage) variables to a new, single value in C++

I have a class Foo with the following thread-specific static member:
__declspec(thread) static bool s_IsAllAboutThatBass;
In the implementation file it is initialized like so:
__declspec(thread) bool Foo::s_IsAllAboutThatBass = true;
So far so good. Now, any thread can flip this bool willy nilly as they deem fit. Then the problem: at some point I want each thread to reset that bool to its initial true value.
How can I slam all instances of the TLS to true from a central thread?
I've thought of ways I could do this with synchronization primitives I know about, like critical sections, read/write sections, or events, but nothing fits the bill. In my real use cases I am unable to block any of the other threads for any significant length of time.
Any help is appreciated. Thank you!
Edit: Plan A
One idea is to use a generation token, or cookie that is read by all threads and written to by the central thread. Each thread can then have a TLS for the last generation viewed by that thread when grabbing s_isAllAboutThatBass via some accessor. When the thread local cookie differs from the shared cookie, we increment the thread local one and update s_isAllAboutThatBass to true.
Here is a light weighted implementation of "Plan A" with C++11 Standard atomic variable and thread_local-specifier. (If your compiler doesn't support them, please replace to vendor specific facilities.)
#include <atomic>
struct Foo {
static std::atomic<unsigned> s_TokenGeneration;
static thread_local unsigned s_LocalToken;
static thread_local bool s_LocalState;
// for central thread
void signalResetIsAllAboutThatBass() {
++s_TokenGeneration;
}
// accessor for other threads
void setIsAllAboutThatBass(bool b) {
unsigned currToken = s_TokenGeneration;
s_LocalToken = currToken;
s_LocalState = b;
}
bool getIsAllAboutThatBass() const {
unsigned currToken = s_TokenGeneration;
if (s_LocalToken < currToken) {
// reset thread-local token & state
s_LocalToken = currToken;
s_LocalState = true;
}
return s_LocalState;
}
};
std::atomic<unsigned> Foo::s_TokenGeneration;
thread_local unsigned Foo::s_LocalToken = 0u;
thread_local bool Foo::s_LocalState = true;
The simplest answer is: you can't. The reason that it's called thread local storage is because only its thread can access it. Which, by definition, means that some other "central thread" can't get to it. That's what it's all about, by definition.
Now, depending on how your hardware and compiler platform implements TLS, there might be a trick around it, if your implemention of TLS works by mapping TLS variables to different virtual memory addresses. Typically, what happens is that one CPU register is thread-specific, it's set to point to different memory addresses, and all TLS variables are accessed as relative addresses.
If that is the case, you could, perhaps, derive some thread-safe mechanism by which each thread takes a pointer to its TLS variable, and puts it into a non-TLS container, that your "central thread" can get to.
And, of course, you must keep all of that in sync with your threads, and clean things up after each thread terminates.
You'll have to figure out whether this is the case on your platform with a trivial test: declare a TLS variable, then compare its pointer address in two different threads. If it's different, you might be able to work around it, in this fashion. Technically, this kind of pointer comparison is non-portable, and implementation defined, but by this time you're already far into implemention-specific behavior.
But if the addresses are the same, it means that your implementation uses virtual memory addressing to implement TLS. Only the executing thread has access to its TLS variable, period, and there is no practical means by which any "central thread" could look at other threads' TLS variables. It's enforced by your operating system kernel. The "central thread" must cooperate which each thread, and make arrangements to access the thread's TLS variables using typical means of interthread communications.
The cookie approach would work fine, and you don't need to use a TLS slot to implement it, just a local variable inside your thread procedure. To handle the case where the cookie changes value between the time that the thread is created and the time that it starts running (there is a small delay), you would have to pass the current cookie value as an input parameter for the thread creation, then your thread procedure can initialize its local variable to that value before it starts checking the live cookie for changes.
intptr_t g_cookie = 1;
pthread_rwlock_t g_lock;
void* thread_proc(void *arg)
{
intptr_t cookie = (intptr_t)arg;
while (keepRunningUntilSomeCondition)
{
pthread_rwlock_rdlock(&g_lock);
if (cookie != g_cookie)
{
cookie = g_cookie;
s_IsAllAboutThatBass = true;
}
pthread_rwlock_unlock(&g_lock);
//...
}
pthread_exit(NULL);
}
void createThread()
{
...
pthread_t thread;
pthread_create(&thread, NULL, &thread_proc, (void*)g_cookie);
...
}
void signalThreads()
{
pthread_rwlock_wrlock(&g_lock);
++g_cookie;
pthread_rwlock_unlock(&g_lock);
}
int main()
{
pthread_rwlock_init(&g_lock, NULL);
// use createThread() and signalThreads() as needed...
pthread_rwlock_destroy(&g_lock);
return 0;
}

Accessing and modifying automatic variables on another thread's stack

I want to pass some data around threads but want to refrain from using global variables if I can manage it. The way I wrote my thread routine has the user passing in a separate function for each "phase" of a thread's life cycle: For instance this would be a typical usage of spawning a thread:
void init_thread(void *arg) {
graphics_init();
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) {
graphics_draw();
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
int main () {
threadCreator factory;
factory.createThread(init_thread, 0, process_msg_thread, 0, cleanup_thread, 0);
// even indexed arguments are the args to be passed into their respective functions
// this is why each of those functions must have a fixed function signature is so they can be passed in this way to the factory
}
// Behind the scenes: in the newly spawned thread, the first argument given to
// createThread() is called, then a message pumping loop which will call the third
// argument is entered. Upon receiving a special exit message via another function
// of threadCreator, the fifth argument is called.
The most straightforward way to do it is using globals. I'd like to avoid doing that though because it is bad programming practice because it generates clutter.
A certain problem arises when I try to refine my example slightly:
void init_thread(void *arg) {
GLuint tex_handle[50]; // suppose I've got 50 textures to deal with.
graphics_init(&tex_handle); // fill up the array with them during graphics init which loads my textures
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) { // this message indicates which texture my thread was told to draw
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]); // send back the handle so it knows what to draw
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
I am greatly simplifying the interaction with the graphics system here but you get the point. In this example code tex_handle is an automatic variable, and all its values are lost when init_thread completes, so will not be available when process_msg_thread needs to reference it.
I can fix this by using globals but that means I can't have (for instance) two of these threads simultaneously since they would trample on each other's texture handle list since they use the same one.
I can use thread-local globals but is that a good idea?
I came up with one last idea. I can allocate storage on the heap in my parent thread, and send a pointer to in to the children to mess with. So I can just free it when parent thread leaves away since I intend for it to clean up its children threads before it exits anyway. So, something like this:
void init_thread(void *arg) {
GLuint *tex_handle = (GLuint*)arg; // my storage space passed as arg
graphics_init(tex_handle);
}
void process_msg_thread(message *msg, void *arg) {
GLuint *tex_handle = (GLuint*)arg; // same thing here
if (msg->ID == MESSAGE_DRAW) {
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]);
}
}
int main () {
threadCreator factory;
GLuint *tex_handle = new GLuint[50];
factory.createThread(init_thread, tex_handle, process_msg_thread, tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
delete[] tex_handle;
}
This looks more or less safe because my values go on the heap, my main thread allocates it then lets children mess with it as they wish. The children can use the storage freely since the pointer was given to all the functions that need access.
So this got me thinking why not just have it be an automatic variable:
int main () {
threadCreator factory;
GLuint tex_handle[50];
factory.createThread(init_thread, &tex_handle, process_msg_thread, &tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
} // tex_handle automatically cleaned up at this point
This means children thread directly access parent's stack. I wonder if this is kosher.
I found this on the internets: http://software.intel.com/sites/products/documentation/hpc/inspectorxe/en-us/win/ug_docs/olh/common/Problem_Type__Potential_Privacy_Infringement.htm
it seems Intel Inspector XE detects this behavior. So maybe I shouldn't do it? Is it just simply a warning of potential privacy infringement as suggested by the the URL or are there other potential issues that may arise that I am not aware of?
P.S. After thinking through all this I realize that maybe this architecture of splitting a thread into a bunch of functions that get called independently wasn't such a great idea. My intention was to remove the complexity of requiring coding up a message handling loop for each thread that gets spawned. I had anticipated possible problems, and if I had a generalized thread implementation that always checked for messages (like my custom one that specifies the thread is to be terminated) then I could guarantee that some future user could not accidentally forget to check for that condition in each and every message loop of theirs.
The problem with my solution to that is that those individual functions are now separate and cannot communicate with each other. They may do so only via globals and thread local globals. I guess thread local globals may be my best option.
P.P.S. This got me thinking about RAII and how the concept of the thread at least as I have ended up representing it has a certain similarity with that of a resource. Maybe I could build an object that represents a thread more naturally than traditional ways... somehow. I think I will go sleep on it.
Put your thread functions into a class. Then they can communicate using instance variables. This requires your thread factory to be changed, but is the cleanest way to solve your problem.
Your idea of using automatic variables will work too as long as you can guarantee that the function whose stack frame contains the data will never return before your child threads exit. This is not really easy to achieve, even after main() returns child threads can still run.