How does access other thread stack variable work in C++? - c++

For example, I have:
int main()
{
int i = 0;
std::thread t([&] {
for (int c = 0; c < 100; ++c)
++i;
});
t.join();
return 0;
}
The thread t change the variable i value.
I think, that when OS changes current thread it must save an old thread stack and copy a new thread stack.
How does operation system provide a right access to the i?
Does it exists any explanation, how it works on an operating system level?
Does it more productive if I will use something like:
int main()
{
int* i = new int;
std::thread t([&] {
for (int c = 0; c < 100; ++c)
++(*i);
});
t.join();
return 0;
}

There are two separate things at play in your example code: capture of local variables to a lambda function and how threads and their stacks work.
Capture of local variables when a lambda function is created works the same way regardless of whether the lambda is in the same thread or a different thread. Basically references to the variables are passed to the lambda.
See How are C++11 lambdas represented and passed? for more details.
Threads, as commented by Margaret Bloom, share the address space of a process. They gave access to read and modify the same memory (including e.g. global variables). While each thread has a different stack area allocated to it, the stacks are all in the address space of the process so all threads can access the stack area of the other threads. So if a thread has a pointer or a reference to a variable in another threads stack, it can read and modify that.
Adding these 2 things together makes your example code work.
The first version of your code is probably slightly more efficient because there is one less level of indirection.

Related

c++ thread local counter implement

I wanna implement a high performance counter in multi-thread process, like this, each thread has a thread local counter named "t_counter" to count query(incr 1/query) and in "timer thread" there is a counter named "global_counter", what I want is each second, global_counter will get each t_counter(s) and add them to global_counter, but I dont know how to get each t_counter value in "timer thread". additional, which section will thread local value lay in main memory ? .data or heap or other? how to dynamic allocate memory size(there maybe 10 thread or 100 thread) ? and does x86-64 use segment register store such value?
Starting with your second question, you can find all the specifications here.
Summarizing, thread local variables are defined in .tdata / .tbss. Those are somewhat similar to .data, however accessing those is different. These sections are replicated per thread. The actual variable offset is computed at the runtime.
A variable is identified by an offset in .tdata. Speaking of x86_64 it will use the FS segment register to find the TCB (Thread control block), using the data structures stored there it will locate the thread local storage where the variable is located. Note that all allocations are done lazily if possible.
Now, regarding your first question - I am not aware of a way to just list all the thread local variables from another thread, and I doubt it is available.
However, a thread can take a pointer to thread variable, and pass it to another thread. So what you probably need is some registration mechanism.
Each new thread will register itself to some main store, then unregister on termination. Registration and deregistration are on your responsibility.
Schematically, it would look like this:
thread_local int counter = 0;
std::map<std::thread::id, int *> regs;
void register() {
// Take some lock here or other synchronization, maybe RW lock
regs[std::this_thread::get_id()] = &counter;
}
void unregister() {
// Again, some lock or other synchronization
regs.erase(std::this_thread::get_id());
}
void thread_main() {
register();
counter++;
unregister();
}
void get_sum() {
// Again, some lock, maybe only read lock
return std::accumulate(regs.begin(), regs.end(), 0,
[](int previous, const auto& element)
{ return previous + *element.second; });
}

How to avoid destroying and recreating threads inside loop?

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
{
auto func = [&](const vector<string> vec){
//do stuff involving variable a
foo3(myVec[a]);
}
thread t1(func, args1);
thread t2(func, args2);
t1.join();
t2.join();
a = 2 * a;
}
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.
You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

Creating std threads in C++ crashes the program

Whenever I execute the following piece of code using threads, the program has this error:
Debug Error!
Program: ... /path/to/.exe
abort() has been called
I want to create a thread that calls a member function. Here is the function I am using:
void ServerVote::createConnexionThreads()
{
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads.push_back(&(std::thread(&ServerVote::acceptConnection,*this, i)));
}
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads[i]->join();
}
}
I can provide additional code if required. When using the debugger, I find that the program crashes right after the first thread is created, after the thread is pushed_back. ~thread() is then called and it crashes inside this function. Here is the vector declaration:
std::vector<std::thread*> m_connexionThreads;
I am using Visual Studio 2015. The acceptConnection function has a while(true) inside it and is planned to be terminated later.
Edit:
Thank you for your answers, but I cannot compile when using a thread object instead of a pointer. So when I try to push into this vector:
std::vector<std::thread> m_connexionThreads;
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads.push_back((std::thread(&ServerVote::acceptConnection,*this, i)));
}
I get this error while compiling:
error C2280: 'std::thread::thread(const std::thread &)': attempting to reference a deleted function
You should not try to use address of the temporary in any context. As a matter of fact, this is a bug in MSVC which allows this code. Any standard-conforming compiler would produce an error here.
Instead, you should use the thread object like this (see my edit below the code on why this is preferred):
#include <thread>
#include <vector>
void acceptConnection(int);
void foo() {
std::vector<std::thread> vec;
for (int i = 0; i <= 50; ++i)
vec.push_back(std::thread(acceptConnection, i));
}
Why this approach is preferred over using an allocated pointer to the thread object? There are multiple benefits:
It is less typing - and even if nothing else, all things being equal (though they are not!) less typing wins over more typing.
It takes caution to use the pointers. For instance, you shouldn't use the raw pointer as vector data type, you should use unique_ptr to ensure automatic memory cleanup - which makes the syntax even uglier!
Using dynamically allocated memory is a drag on performance. You are hit twice - first time when you allocate memory, second time when you free it. Why suffer this penalty?
You are creating a local instance of thread in stack, taking its address and pushing it to the vector. The thread object will be deleted on exit of the method, so you will be left with a pointer to a deleted object.
You should use new to create the thread object in heap so it will not be deleted on method exit, or not use pointers to thread objects.

Do I need to use volatile keyword if I declare a variable between mutexes and return it?

Let's say I have the following function.
std::mutex mutex;
int getNumber()
{
mutex.lock();
int size = someVector.size();
mutex.unlock();
return size;
}
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
{
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
}
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet

Avoding multiple thread spawns in pthreads

I have an application that is parallellized using pthreads. The application has a iterative routine call and a thread spawn within the rountine (pthread_create and pthread_join) to parallelize the computation intensive section in the routine. When I use an instrumenting tool like PIN to collect the statistics the tool reports statistics for several threads(no of threads x no of iterations). I beleive it is because it is spawning new set of threads each time the routine is called.
How can I ensure that I create the thread only once and all successive calls use the threads that have been created first.
When I do the same with OpenMP and then try to collect the statistics, I see that the threads are created only once. Is it beacause of the OpenMP runtime ?
EDIT:
im jus giving a simplified version of the code.
int main()
{
//some code
do {
compute_distance(objects,clusters, &delta); //routine with pthread
} while (delta > threshold )
}
void compute_distance(double **objects,double *clusters, double *delta)
{
//some code again
//computation moved to a separate parallel routine..
for (i=0, i<nthreads;i++)
pthread_create(&thread[i],&attr,parallel_compute_phase,(void*)&ip);
for (i=0, i<nthreads;i++)
rc = pthread_join(thread[i], &status);
}
I hope this clearly explains the problem.
How do we save the thread id and test if was already created?
You can make a simple thread pool implementation which creates threads and makes them sleep. Once a thread is required, instead of "pthread_create", you can ask the thread pool subsystem to pick up a thread and do the required work.. This will ensure your control over the number of threads..
An easy thing you can do with minimal code changes is to write some wrappers for pthread_create and _join. Basically you can do something like:
typedef struct {
volatile int go;
volatile int done;
pthread_t h;
void* (*fn)(void*);
void* args;
} pthread_w_t;
void* pthread_w_fn(void* args) {
pthread_w_t* p = (pthread_w_t*)args;
// just let the thread be killed at the end
for(;;) {
while (!p->go) { pthread_yield(); }; // yields are good
p->go = 0; // don't want to go again until told to
p->fn(p->args);
p->done = 1;
}
}
int pthread_create_w(pthread_w_t* th, pthread_attr_t* a,
void* (*fn)(void*), void* args) {
if (!th->h) {
th->done = 0;
th->go = 0;
th->fn = fn;
th->args = args;
pthread_create(&th->h,a,pthread_w_fn,th);
}
th->done = 0; //make sure join won't return too soon
th->go = 1; //and let the wrapper function start the real thread code
}
int pthread_join_w(pthread_w_t*th) {
while (!th->done) { pthread_yield(); };
}
and then you'll have to change your calls and pthread_ts, or create some #define macros to change pthread_create to pthread_create_w etc....and you'll have to init your pthread_w_ts to zero.
Messing with those volatiles can be troublesome though. you'll probably need to spend some time getting my rough outline to actually work properly.
To ensure something that several threads might try to do only happens once, use pthread_once(). To ensure something only happens once that might be done by a single thread, just use a bool (likely one in static storage).
Honestly, it would be far easier to answer your question for everyone if you would edit your question – not comment, since that destroys formatting – to contain the real code in question, including the OpenMP pragmas.