Confused about pthreads

Confused about pthreads - c++

First, I am new to pthreads, so if I completely misunderstood, please just let me know.
I had searched for the proper method of returning values and came across this link How to return a value from thread in C and https://linuxprograms.wordpress.com/category/pipes/ .
So I can share locations controlled by the starting thread OR pipe information, but the last value can't be put on some stack? Why can't I return in the same way that a program does when called by a shell (such as bash)?
(From what I understand, it would be possible to have a regular return value if it was C++, but (according to something I read I think here perhaps https://computing.llnl.gov/tutorials/pthreads/) POSIX isn't completely defined for C++, just C.)

Take a look at pthread_exit and pthread_join.
When you are done with your thread you can call pthread_exit(void* retval)
The pthread_exit() function terminates the calling thread and returns
a value via retval that (if the
thread is joinable) is available to another thread in the same process that calls pthread_join(3).
This call to pthread_exit will stop your thread and, as it says, store the return value where pthread_join can get to it and place it in its second argument: int pthread_join(pthread_t thread, void **retval);
When you call pthread_join(tid, &returnVal); where tid is a pthread_t, returnVal will now hold a pointer to the value returned given to pthread_exit
This allows you to pass data out of threads on their exit.

Each thread has its own stack and local environment with the parent process. Your main process creates one thread (the main thread) and your code runs under it. Any other threads you create, get the same treatment: each gets a stack, a thread context, thread local storage (where applicable) and there is no common stack to return a value.
When you join a thread you started, what happens is you are actually waiting for it to finish executing. This will unblock the wait, but will not return any user value since typically, the thread, its stack, and all of its environment within the process is destroyed.
While threads are running, they can communicate with one another in the ways you mentioned, and they can also read/write to common memory locations as long as you use a synchronization mechanism to serialize those accesses.
If you must have a return value from your thread, then you might want to encapsulate it in a class, pass it the class instance on start, then just before the thread exits, it can leave a "return value" in a member of this class so you can examine it after the class "run" or "start" method (the one that actually runs the thread) returns.
Hope this helps.

Related

Why can't call std::thread.join() more than once for a given thread?

I'm new to C++ multi-threaded programming, and I encountered some difficulties about the join() function while reading a book:
The act of calling join() also cleans up any storage associated
with the thread, so the std::thread object is no longer associated with the now-finished thread; it isn’t associated with any thread. This means that you can call join() only once for a given thread; once you’ve called join(), the std::thread object is no longer joinable, and joinable() will return false.
What does "storage associated with the thread" specifically mean, and why is it cleaning up the storage associated with the thread when calling join()? Can anyone explain the principles behind this?

What does "storage associated with the thread" specifically mean, and
why is it cleaning up the storage associated with the thread when
calling join()? Can anyone explain the principles behind this?
When a thread is terminated, the OS keeps its return state inside an entry of a specific data structure of the kernel. If another thread calls join() on the terminated joinable thread, that entry will be removed. But if you do not call join() for a joinable thread, a zombie thread is created. In other words, a zombie thread, is a kind of thread that has already been terminated, but an entry about its status still exist in the kernel.
Considering the fact that each zombie thread consumes some system resources, if many zombie threads are accumulated, it will no longer be
possible to create new threads.

The thread storage does nothing to do with the thread stack as in the other answer. When the thread terminates/finishes, all thread resources are freed except one in the OS kernel structures, that keeps thread exit status and possible other data used by OS. Joining just removes that data, OS forgets that thread for ever, further access to the thread is undefined behavior, for example can lead to unpredictable joining to a newly created thread.

what does "storage associated with the thread" specific mean.
Mostly it means the thread's call stack. Probably a few megabytes, but they don't make it easy to find out exactly how much space is allocated or, provide any well defined way to change it. See How to set the stacksize with C++11 std::thread
Edit:
My original answer (above) was written in haste, and it failed to answer the main question. What I was thinking about at the time was how the biggest chunk of memory that must be released when cleaning up a thread is the thread's stack.
Why can't call std::thread.join() more than once...?
Because that's how the C++ standard library was designed. In most operating systems, the system call that creates a new thread must be balanced by a system call that destroys it, and the designers of the C++ standard library decided that, for any thread that is not detached, then that second system call should happen within the one and only join() call.
There generally are two parts to the resources that must be released when a thread ends; the "user space" part (i.e., what the application controls,) and the "kernel space" part
The user space part consists mostly of the thread's call stack. It's typically going to be a contiguous chunk of several megabytes in the process's virtual address space. How it gets allocated depends on the operating system. In Windows, the thread stack is allocated by the CreateThread(...) WinAPI call, and it gets freed when the thread itself calls ExitThread(...)—always the last thing that any Windows thread does before it terminates. In Linux, the application is responsible for allocating the stack before it creates the new thread and, to free the stack after is no longer needed.
The user-space part consists mostly of the thread's call stack, but there's also the std::thread object itself. How and when the thread object gets allocated and freed works exactly the same way as for any other C++ object is allocated and freed.
The kernel-space part is relatively small: It's a record somewhere within the kernel's memory that holds the thread's state (e.g., "running," "runnable," "waiting for...," "dead,") and it holds the thread's context. The context is a snapshot of all of the CPU's registers that gets taken every time the thread is preempted, and then loaded back into the CPU registers when it's time to let the thread run again.

Will my pthread wait or will the main thread wait?

So i'm getting the hang of using c/c++ but i'm still a bit misguided. I'm also trying to learn synchronization at the same time so things aren't going perfect.
So my potential problem here is,
I have a Node object, Node has a method called run. Run creates a pthread and passes a function pointer of a function called compute() as a parameter.
The Compute function has one parameter which is the Node that called Run()
The Compute function will then access a Semaphore (sem_t) that is a field of the Node object passed as a parameter and will call sem_wait(Node.sem) on that semaphore.
If I do this, will the newly created thread that is running the compute function actually call the sem_wait and do the defined behavior. Or will the the process that originally created the Node call sem_wait?

The sem_wait call will execute in the thread in which it was called (as #Jason C points out in his comment). From what you've described that happens in run after the thread has been started, hence sem_wait will be executed in the first thread.
You seem to be thinking that because the Node object is used in both threads that somehow has an effect on which thread will execute a call. It doesn't. Threads share memory space so your Node object can be used in any thread within a process. That's when you start getting into thread safety issues.

What happens to std::async call if parent/main thread dies

If I am right, the std::async uses a new thread and calls the method in it. I was wondering what happens if the main thread or the parent thread dies. Does the thread controlling the async method dies as well.

There is no concept of a "parent" thread in C++, each thread is independent of the one that it was created by. However, the main thread is special and if it returns from main() or calls exit() then the entire application is terminated even if other threads are still running. Once that happens, the program has undefined behaviour if the still-running threads access any global variables or automatic objects that were on the main thread's stack, or use any standard library objects or call any function not permitted in signal handlers.
In short, do not let other threads run after main completes if you expect sensible results.

Keep track of pthread

I put up many threads running. At a later time, I'd like to check if these threads are still alive (i.e., not finished yet and not terminated unexpectedly).
What kind of information should I keep track of regarding the threads in the first place. Thread ID, process ID, etc? How should I get these IDs?
When I need to check the liveness of these threads, what functions should I use? Will pthread_kill work here? pthread_kill takes an opaque type pthread_t as parameter, which I believe is typically an unsigned long. Is pthread_t different from a thread ID? I assume a thread ID would pick up an int as its value. In some tutorials on pthread, they assign an integer to a pthread as its ID. Shouldn't the thread get its ID from the operating system?

A thread's entire identity resides in pthread_t
Initializing a thread returns its pthread_t typed ID to its parent
Each thread can get it's own ID with pthread_self()
You can compare thread IDs using the function:int pthread_equal (pthread_t, pthread_t)
So: Maintain a common data structure where you can store thread status as STARTED, RUNNING, FINISHED using the pthread_t IDs and pthread_equal comparison function to differentiate between the threads. The parent sets the value to STARTED when it starts the thread, the thread itself sets its own state to RUNNING, does its work, and sets itself to FINISHED when done. Use a mutex to make sure values are not changed while being read.
EDIT:
You can set up a sort of 'thread destructor' using pthread_cleanup_push:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cleanup_pop.html
i.e. register a routine to be called when the thread exits (either itself, or by cancellation externally). This routine can update the status.

When you call pthread_create, the first argument is a pointer to a pthread_t, to which pthread_create will assign the thread ID of the newly created thread. If you want to get the thread ID of the current thread, use pthread_self(). This is the only identifying piece of information you need for the thread because all threads created this way share the same process ID.
The way you would check whether a thread is alive depends on what you need this information for. If you just want to wait until the thread has completed, you call pthread_join with the thread ID as the first argument and a pointer to a location for the return value of the thread function as the second argument. Unless you detach the threads you create by calling pthread_detach(pthread_self()) in the thread, you need to call pthread_join on them eventually so that they don't continue to hold on to their stack space.
If for some reason you want to do something while the thread is running, you could create a global variable for each thread that that thread changes when it terminates, and check that variable with the main thread. In that case, you would probably want to detach the threads so that you don't also have to join them later.

Do threads clean-up after themselves in Win32/MFC and POSIX?

I am working on a multithreaded program using C++ and Boost. I am using a helper thread to eagerly initialize a resource asynchronously. If I detach the thread and all references to the thread go out of scope, have I leaked any resources? Or does the thread clean-up after itself (i.e. it's stack and any other system resources needed for the itself)?
From what I can see in the docs (and what I recall from pthreads 8 years ago), there's not explicit "destory thread" call that needs to be made.
I would like the thread to execute asynchronously and when it comes time to use the resource, I will check if an error has occured. The rough bit of code would look something like:
//Assume this won't get called frequently enough that next_resource won't get promoted
//before the thread finishes.
PromoteResource() {
current_resource_ptr = next_resource_ptr;
next_resource_ptr.reset(new Resource());
callable = bind(Resource::Initialize, next_resource); //not correct syntax, but I hope it's clear
boost::thread t(callable);
t.start();
}
Of course--I understand that normal memory-handling problems still exist (forget to delete, bad exception handling, etc)... I just need confirmation that the thread itself isn't a "leak".
Edit: A point of clarification, I want to make sure this isn't technically a leak:
void Run() {
sleep(10 seconds);
}
void DoSomething(...) {
thread t(Run);
t.run();
} //thread detaches, will clean itself up--the thread itself isn't a 'leak'?
I'm fairly certain everything is cleaned up after 10 seconds-ish, but I want to be absolutely certain.

The thread's stack gets cleaned up when it exits, but not anything else. This means that anything it allocated on the heap or anywhere else (in pre-existing data structures, for example) will get left when it quits.
Additionally any OS-level objects (file handle, socket etc) will be left lying around (unless you're using a wrapper object which closes them in its destructor).
But programs which frequently create / destroy threads should probably mostly free everything that they allocate in the same thread as it's the only way of keeping the programmer sane.

If I'm not mistaken, on Windows Xp all resources used by a process will be released when the process terminates, but that isn't true for threads.

Yes, the resources are automatically released upon thread termination. This is a perfectly normal and acceptable thing to do to have a background thread.
To clean up after a thread you must either join it, or detach it (in which case you can no longer join it).
Here's a quote from the boost thread docs that somewhat explains that (but not exactly).
When the boost::thread object that
represents a thread of execution is
destroyed the thread becomes detached.
Once a thread is detached, it will
continue executing until the
invocation of the function or callable
object supplied on construction has
completed, or the program is
terminated. A thread can also be
detached by explicitly invoking the
detach() member function on the
boost::thread object. In this case,
the boost::thread object ceases to
represent the now-detached thread, and
instead represents Not-a-Thread.
In order to wait for a thread of
execution to finish, the join() or
timed_join() member functions of the
boost::thread object must be used.
join() will block the calling thread
until the thread represented by the
boost::thread object has completed. If
the thread of execution represented by
the boost::thread object has already
completed, or the boost::thread object
represents Not-a-Thread, then join()
returns immediately. timed_join() is
similar, except that a call to
timed_join() will also return if the
thread being waited for does not
complete when the specified time has
elapsed.

In Win32, as soon as the thread's main function, called ThreadProc in the documentation, finishes, the thread is cleaned up. Any resources allocated by you inside the ThreadProc you'll need to clean up explicitly, of course.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js