How to map pthread_t to pid (on Linux) - c++

Is there a sane way to map a pthread_t value (as returned from pthread_create() or std::thread::native_hanle() ) to pid(tid) in Linux? Before someone gets duplicate-happy, this is not about finding thread's own pid (which can be done with gettid()).
The insane way would be to somehow compel a thread to call gettid() and pass along the result, but that's way too much trouble.
One of the possible applications I have in mind is to reconcile threads created within program (where pthread_t is available) with output provided by ps -T.

One (convoluted, non-portable, Linux-specific, lightly destructive) method of mapping pthread_t to tid without looking into struct pthread is as follows:
Use pthread_setname_np to set a thread name to something unique.
Iterate over subdirectories of /proc/self/task and read a line from a file named comm in each of those.
If the line equals to the unique string just used, extract tid from the last component of the subdirectory name. This is your answer.
The thread name is not used by the OS for anything, so it should be safe to change it. Nevertheless you probably want to set it back to the value it had originally (use pthread_getname_np to obtain it).

A somewhat hacky way of doing this on Linux would be with the process_vm_readv system call. If you know the pthread_t, use it to copy the memory over from the remote process to a local buffer, then cast the pointer to a struct pthread and retrieve the value of the tid field.

Pthreads are POSIX Threads.
In pthread_t is a typedef to some type of long depending on your architecture.
It is actually a pointer typecasted to an internal struct pthread as mentioned here above.
It is on purpose that the struct pthread is not returned.
Threading is highly dependent on the underlying OS. Threads are not implemented equally on all Unix flavored OS'es.
I don't believe even that gettid is a POSIX function, I believe it is Linux specific.
You can have a look at the glibc/nptl source code for linux specific implementation of struct pthread
See https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/descr.h;h=fdeb397eab94730a5dab3181abcdae815ed6914e;hb=48a8f8328122ab8d06b7333cb87be46feeaf7cca
But I believe what you are looking for is getpid. It is the process id and not thread id

Related

Where is the pid stored?

I have the following question in an assignment:
In every one second a process calls the following function:
#include <string>
using namespace std;
string create_file_name(time_t timestamp) {
pid_t pid = getpid();
string s = “results-” + to_string(pid) + to_string(timestamp);
return s;
}
The question is where does the kernel store the process PID?
there are 5 different answers:
user's stack \ kernel's stack \ heap \ PCB \ runqueue
Now generally, I know that the PID is stored inside the PCB but in this case, should it also be stored inside the user's stack? (since it's a local variable).
The question seems to have only one answer, so I am quite confused.
As said from the manpage :
From glibc version 2.3.4 up to and including version 2.24, the glibc
wrapper function for getpid() cached PIDs, with the goal of avoiding
additional system calls when a process calls getpid() repeatedly.
Normally this caching was invisible, but its correct operation relied
on support in the wrapper functions for fork(2), vfork(2), and
clone(2): if an application bypassed the glibc wrappers for these
system calls by using syscall(2), then a call to getpid() in the
child would return the wrong value (to be precise: it would return
the PID of the parent process). In addition, there were cases where
getpid() could return the wrong value even when invoking clone(2) via
the glibc wrapper function. (For a discussion of one such case, see
BUGS in clone(2).) Furthermore, the complexity of the caching code
had been the source of a few bugs within glibc over the years.
Because of the aforementioned problems, since glibc version 2.25, the
PID cache is removed: calls to getpid() always invoke the actual
system call, rather than returning a cached value.
On Alpha, instead of a pair of getpid() and getppid() system calls, a
single getxpid() system call is provided, which returns a pair of PID
and parent PID. The glibc getpid() and getppid() wrapper functions
transparently deal with this. See syscall(2) for details regarding
register mapping.
It depend on the glibc you use. In fact in some version glibc mantains a cache of the pid, while in some versions it repetedly call the syscall to get the pid of the process if you want to know how the system call work is suggest you to see the kernel code.
You can find the getpid() function at this link. ( you can change the kernel version and navigate all the source code to rebuild how the getpid() syscall works.

Is it safe/defined to assume the value of a Windows pseudo handle?

I am writing a piece C++ code for Windows that needs to query the timings for the process and each of its individual threads.
To do the necessary system calls, I need the handles for the process and each of its threads. I am using the getCurrentProcess and getCurrentThread functions which both return a pseudo handle. Upon further inspection, I noticed that the pseudo handles for all threads share the same value.
After a brief search on the internet, I found the following article that reports the same values for the process and thread pseudo handles as I got: https://weseetips.wordpress.com/2008/03/26/getcurrentthread-returns-pseudo-handle-not-the-real-handle/
My question: is it safe and/or defined to call getCurrentThread once from one thread and use the returned pseudo handle in all other threads to let them refer to themselves?
Using the current implementation, this works as expected. I am just wondering if this behavior is guaranteed. In other words, will it work on any Windows platform that offers the getCurrentThread function; and would changing the behavior be considered a breaking change?
The documentation for the getCurrentThread function states (emphasis mine):
A pseudo handle is a special constant that is interpreted as the
current thread handle. The calling thread can use this handle to
specify itself whenever a thread handle is required. [...]
Which makes me believe this special pseudo handle is just an alias for "the current thread" and can therefore be shared among all threads to let them refer to themselves. On the other hand, the documentation also says that the return value can be used by the calling thread, hence my confusion!
yes, this is safe - this pseudo handles is well known values and documented in wdm.h (from windows wdk)
#define NtCurrentProcess() ( (HANDLE)(LONG_PTR) -1 )
#define ZwCurrentProcess() NtCurrentProcess()
#define NtCurrentThread() ( (HANDLE)(LONG_PTR) -2 )
#define ZwCurrentThread() NtCurrentThread()
#define NtCurrentSession() ( (HANDLE)(LONG_PTR) -3 )
#define ZwCurrentSession() NtCurrentSession()
and can therefore be shared among all threads to let them refer to
themselves.
of course no, it can not be "shared among all threads" - only current process/thread can use it for refer to themselves.
when kernel mode api got thread/process handle as input parameter - he need convert handle to object pointer (ETHREAD or EPROCESS). for this first check for this constant values - if yes - used pointer to current thread/process object. otherwise handle is index in process handle table
Yep. From that same page:
The function cannot be used by one thread to create a handle that can be used by other threads to refer to the first thread. The handle is always interpreted as referring to the thread that is using it. A thread can create a "real" handle to itself that can be used by other threads, or inherited by other processes, by specifying the pseudo handle as the source handle in a call to the DuplicateHandle function.

Like '__LINE__', is there any standard macro in C/C++ which prints thread name or ID?

I am using some functions across multiple threads in my application written in C++ in QNX IDE. Sometimes while analyzing the logs, it is difficult to find the thread who has called it. Though I can use gettid or pthread_getname_np in logs, still looking for a standard macro like __LINE__ .
Even if it is not supported by QNX, I would like to know if any other OS/Compiler/standard has it.
The line number of a line is known at compile time, even by the preprocessor, so the preprocessor can substitute __LINE__ with the actual line number.
But the thread-id is only known at run-time, and furthermore it will be different for different executions of the same statement. So it cannot possibly be the value of a macro.
You precisely need to use a run-time call like the ones you mention in order to discover the thread id. You may well need a platform-specific mechanism, since neither Posix nor C define a portable mechanism to obtain a thread id. As of C++11, you can use std::this_thread::get_id() to obtain a unique, printable thread id.
In C++11, check out std::this_thread::get_id.
It has a method of generating human-readable strings, if you need them. There is no standard macro for this behavior, as it's rather run-time dependent.
No, because unlike __LINE__ thread IDs are not statically determined at compile time. Moreover threads are not part of the C or C++ languages, so the compiler is not aware of them - you'd have to make a call to determine the thread ID in any case (though C++ 11 supports threads through the standard library).
If you really must, this can always be faked with a macro:
#define __THREAD__ gettid()
or in C++ 11:
#define __THREAD__ std::this_thread::get_id()
Or whatever system dependent means of obtaining a thread or process ID is provided by the target - it is not entirely portable, although C++11 holds out the best possibility for portability once support is ubiquitous.
But to do so hides the overhead of the function call and makes it look like a literal constant, I'm not sure I'd advocate misleading code. Moreover the use of __ is reserved, so it is further misleading.
If you are satisfied with an identifying handle for a thread, then the same value that is returned in the first parameter to pthread_create() can also be obtained by the running thread by calling pthread_self().This was pointed out in a comment by Chrono Kitsune
If you are only creating a static set of threads and you want to associate each with an ordinal value, you could use a static counter and a thread local variable. At the creation of each thread, atomically read and increment the counter, and set the thread local variable to the read value.

How pthread_mutex_unlock distinguish threads?

Only the owner of mutex can unlock it. But how mutex distinguish thread that locked it?
Does threads have any distinctive features in Linux?
You can look at the implementation source code for details (the pthread implementation from the GNU libc Git repository can be browsed here), but they have different IDs that are used internally. You can see this at the application level using pthread_self(). It returns a pthread_t value that is unique on a per-thread basis within a given process. There is no guarantee of uniqueness when you compare pthread_t values from different processes.
The actual type that pthread_t corresponds to is implementation-defined, however; it could be an arithmetic (e.g. integral) type, or it could be a structure. Therefore, you can't really do much with them in a portable way, other than compare them for equality using pthread_equal().
They are differentiated using the thread id;

how can I tell if pthread_self is the main (first) thread in the process?

background: I'm working on a logging library that is used by many programs.
I'm assigning a human-readable name for each thread, the main thread should get "main", but I'd like to be able to detect that state from within the library without requiring code at the beginning of each main() function.
Also note: The library code will not always be entered first from the main thread.
This is kinda doable, depending on the platform you're on, but absolutely not in any portable and generic way...
Mac OS X seems to be the only one with a direct and documented approach, according to their pthread.h file:
/* returns non-zero if the current thread is the main thread */
int pthread_main_np(void);
I also found that FreeBSD has a pthread_np.h header that defines pthread_main_np(), so this should work on FreeBSD too (8.1 at least), and OpenBSD (4.8 at least) has pthread_main_np() defined in pthread.h too. Note that _np explicitly stands for non-portable!
Otherwise, the only more "general" approach that comes to mind is comparing the PID of the process to the TID of the current thread, if they match, that thread is main.
This does not necessarily work on all platforms, it depends on if you can actually get a TID at all (you can't in OpenBSD for example), and if you do, if it has any relation to the PID at all or if the threading subsystem has its own accounting that doesn't necessarily relate.
I also found that some platforms give back constant values as TID for the main thread, so you can just check for those.
A brief summary of platforms I've checked:
Linux: possible here, syscall(SYS_gettid) == getpid() is what you want
FreeBSD: not possible here, thr_self() seems random and without relation to getpid()
OpenBSD: not possible here, there is no way to get a TID
NetBSD: possible here, _lwp_self() always returns 1 for the main thread
Solaris: possible here, pthread_self() always returns 1 for the main thread
So basically you should be able to do it directly on Mac OS X, FreeBSD and OpenBSD.
You can use the TID == PID approach on Linux.
You can use the TID == 1 approach on NetBSD and Solaris.
I hope this helps, have a good day!
Call pthread_self() from main() and record the result. Compare future calls to pthread_self() to your stored value to know if you're on the main thread.
You can utilize some kind of shared name resource that allows threads that spawn to register a name (perhaps a map of thread id to name). Your logging system can then place a call into a method that gets the name via the thread ID in a thread-safe manner.
When the thread dies, have it remove it's name from the mapping to avoid leaking memory.
This method should allow all threads to be named, not just main.