pthread_key_t and pthread_once_t? - c++

Starting with pthreads, I cannot understand what is the business with pthread_key_t and pthread_once_t?
Would someone explain in simple terms with examples, if possible?
thanks

pthread_key_t is for creating thread thread-local storage: each thread gets its own copy of a data variable, instead of all threads sharing a global (or function-static, class-static) variable. The TLS is indexed by a key. See pthread_getspecific et al for more details.
pthread_once_t is a control for executing a function only once with pthread_once. Suppose you have to call an initialization routine, but you must only call that routine once. Furthermore, the point at which you must call it is after you've already started up multiple threads. One way to do this would be to use pthread_once(), which guarantees that your routine will only be called once, no matter how many threads try to call it at once, so long as you use the same control variable in each call. It's often easier to use pthread_once() than it is to use other alternatives.

No, it can't be explained in layman terms. Laymen cannot successfully program with pthreads in C++. It takes a specialist known as a "computer programmer" :-)
pthread_once_t is a little bit of storage which pthread_once must access in order to ensure that it does what it says on the tin. Each once control will allow an init routine to be called once, and once only, no matter how many times it is called from how many threads, possibly concurrently. Normally you use a different once control for each object you're planning to initialise on demand in a thread-safe way. You can think of it in effect as an integer which is accessed atomically as a flag whether a thread has been selected to do the init. But since pthread_once is blocking, I guess there's allowed to be a bit more to it than that if the implementation can cram in a synchronisation primitive too (the only time I ever implemented pthread_once, I couldn't, so the once control took any of 3 states (start, initialising, finished). But then I couldn't change the kernel. Unusual situation).
pthread_key_t is like an index for accessing thread-local storage. You can think of each thread as having a map from keys to values. When you add a new entry to TLS, pthread_key_create chooses a key for it and writes that key into the location you specify. You then use that key from any thread, whenever you want to set or retrieve the value of that TLS item for the current thread. The reason TLS gives you a key instead of letting you choose one, is so that unrelated libraries can use TLS, without having to co-operate to avoid both using the same value and trashing each others' TLS data. The pthread library might for example keep a global counter, and assign key 0 for the first time pthread_key_create is called, 1 for the second, and so on.

Wow, the other answers here are way too verbose.
pthread_once_t stores state for pthread_once(). Calling pthread_once(&s, fn) calls fn and sets the value pointed to by s to record the fact it has been executed. All subsequent calls to pthread_once() are noops. The name should become obvious now.
pthread_once_t should be initialized to PTHREAD_ONCE_INIT.

Related

Can you really wait on Condition Variable with WaitFor...Object(s)?

I'm trying to implement some sort of waiting on many CONDITION_VARIABLE.
The answers here imply that WaitForMultipleObjects and such are valid options when dealing with Windows API (and many more places over the internet), but it appears that it is not the case.
first of all, nowhere in the MSDN documentation it is written that a Windows Condition variable is a valid argument for WaitFor... functions.
Second of all, it appears that WaitFor... only accepts HANDLE type as argument, which is basically a kernel object. but PCONDITION_VARIABLE is not really a HANDLE.
finally, trying to use a condition variable (both as a PCONDITION_VARIABLE and the undocumented CONDITION_VARIABLE::Ptr) makes the functions return error code 6 (invalid handle)
for example:
CONDITION_VARIABLE cv;
InitializeConditionVariable(&cv);
auto res = WaitForSingleObject(cv.Ptr, INFINITE); //returns immediately
if (res != WAIT_OBJECT_0) {
auto ec = GetLastError();
std::cout << ec << "\n";
}
so, can you really wait on a condition variable or it's just an urban legend?
I don't think so and it doesn't make any sense.
First of all, the WaitForXxx functions operate (mostly) on dispatcher objects - a subset of kernel objects including timers, events, mutexes, sempahores, threads and process (and a few internal object types like KAGTEs and KQUEUEs, but not access tokens or file mapping objects) that have a DISPATCHER_HEADER. It certainly won't work on user mode constructs that the kernel is unaware of.
Second, note that when you sleep ("wait") on a condition variable you have to specify whether this is critical section-based condition variable or a SRWL-based condition variable by using the correct function - either SleepConditionVariableCS or SleepConditionVariableSRW. So again, Windows (not only the kernel) has no idea what kind of condition variable you're passing it, but it needs this information to operate correctly. Since you don't provide this information to WaitForXxx it follows that they cannot be used with condition variables.
The simple answer to your question is no. You cannot use the WaitForXxx functions with the condition variables provided by the Windows synchronization APIs. From the linked documentation:
Condition variables are synchronization primitives that enable threads to wait until a particular condition occurs. Condition variables are user-mode objects that cannot be shared across processes.
The WaitForXxx functions accept parameters of the generic HANDLE type, which represents a handle to a kernel object. Condition variables are user-mode objects, not kernel objects, so you cannot use them with these functions, since they work only with kernel objects.
Moreover, the documentation for these functions is pretty explicit about which types of objects they can wait on, and condition variables are not on that list. For instance, WaitForMultipleObjects says:
The WaitForMultipleObjects function can specify handles of any of the following object types in the lpHandles array:
Change notification
Console input
Event
Memory resource notification
Mutex
Process
Semaphore
Thread
Waitable timer
They all have the same list, so no confusion there.
Technically speaking (and we're diving into undocumented implementation details here, so you shouldn't rely on this as gospel), the Win32 WaitForSingleObject and WaitForMultipleObjects functions are built upon the KeWaitForSingleObject and KeWaitForMultipleObjects functions provided by the kernel subsystem. You can divide the objects supported by the kernel into three basic categories: dispatcher objects, I/O objects/data structures, and everything else. The first category, dispatcher objects, are the lowest level objects and they are all represented using the same DISPATCHER_HEADER data structure in their bodies. Dispatcher objects are the only types of objects that are "waitable". It is this DISPATCHER_HEADER structure that makes an object waitable, by definition. If the object is represented using this data structure, then it can be passed to the kernel synchronization functions. Thus, the same rules would apply to the Win32 functions.
This entire question seems to be based around a single statement that Managu makes in his answer: "Windows has WaitForMultipleObjects as aJ posted, which could be a solution if you're willing to restrict your code to Windows synchronization primitives." Perhaps he doesn't consider condition variables (as they are implemented by Windows) to be synchronization primitives, or perhaps he is just wrong. aJ's answer, to which he refers, is pretty clear about stating that WaitForMultipleObjects is used "to wait for multiple kernel objects," and we have already established that condition variables are not kernel objects. Either way, I don't see any evidence for an "urban legend" that you can do this.
Obviously you cannot use the WaitForXxx family of functions with boost::condition_variable, or std::condition_variable, or anything else. I'm sure you already knew that, but your question has confused some people because it links to a question that refers to the Boost implementation.
It is not especially clear to me why you would need to wait on multiple condition variables simultaneously. I guess you could write your own implementation of condition variables, based on the classic Win32 synchronization primitives, such as mutexes, which you can then wait on with WaitForMultipleObjects. You can probably find examples of such code online, since condition variables did not become part of the operating system until Vista. For example, this article discusses strategies for implementing condition variables in Windows as they are defined by the POSIX Pthreads specification. You could also look into using Event Objects.

Is A Member Function Thread Safe?

I have in a Server object multiple thread who are doing the same task. Those threads are init with a Server::* routine.
In this routine there is a infinite loop with some treatments.
I was wondering if it was thread safe to use the same method for multiple threads ? No wonder for the fields of the class, If I want to read or write it I will use a mutex. But what about the routine itself ?
Since a function is an address, those thread will be running in the same memory zone ?
Do I need to create a method with same code for every thread ?
Ps: I use std::mutex(&Server::Task, this)
There is no problem with two threads running the same function at the same time (whether it's a member function or not).
In terms of instructions, it's similar to if you had two threads reading the same field at the same time - that's fine, they both get the same value. It's when you have one writing and one reading, or two writing, that you can start to have race conditions.
In C++ every thread is allocated its own call stack. This means that all local variables which exist only in the scope of a given thread's call stack belong to that thread alone. However, in the case of shared data or resources, such as a global data structure or a database, it is possible for different threads to access these at the same time. One solution to this synchronization problem is to use std::mutex, which you are already doing.
While the function itself might be the same address in memory in terms of its place in the table you aren't writing to it from multiple locations, the function itself is immutable and local variables scoped inside that function will be stacked per thread.
If your writes are protected and the fetches don't pull stale data you're as safe as you could possibly need on most architectures and implementations out there.
Behind the scenes, int Server::Task(std::string arg) is very similar to int Server__Task(Server* this, std::string arg). Just like multiple threads can execute the same function, multiple threads can also execute the same member function - even with the same arguments.
A mutex ensures that no conflicting changes are made, and that each thread sees every prior change. But since code does not chance, you don't need a mutex for it, just like you don't need a mutex for string literals.

Locking in function hierarchies

I am currently running into some design problems regarding concurrent programming in C++
and I was wondering if you could help me out:
Assume that some function func operates on some object obj. It is necessary during these operations to hold a lock (which might be a member variable of obj). Now assume that
func calls a subfunction func_2 while it holds the lock. Now func_2 operates on an object which is already locked. However, what if I also want to call func_2 from somewhere else without holding the lock? Should func_2 lock obj or should it not? I see 3 possibilites:
I could pass a bool to func_2 indicating whether ot not locking is required.
This seems to introduce a lot of boilerplate code though.
I could use a recursive lock and just always lock obj in func_2. Recursive locks
seem to
be problematic though, see here.
I could assume that every caller of func_2 holds the lock already. I would have
to document this and perhaps enforce this (at least in debugging mode). Is
it reasonable to have functions make assumptions regarding which locks are / are not
held by the calling thread? More generally, how do I decide from a design perspective
whether a function should lock Obj and which should assume that it is already locked?
(Obviously if a function assumes that certain locks are held then it can only call
functions which make at least equally strong assumptions but apart from that?)
My question is the following: Which one of these approaches is used in practice and why?
Thanks in advance
hfhc2
1. Passing an indicator whether to lock or not:
You give the the lock choice to the caller. This is error prone:
the caller might not do the right choice
the caller needs to know implementation details about your object, thus breaking the principle of encapsulation
the caller needs access to the mutex
If you have several objects, you eventually facilitate conditions for deadlocks
2. recursive lock:
You already highlighted the issue.
3. Pass locking responsibility to caller:
Among the different alternatives that you propose, this seems the most consistent. On contrary of 1, ou don't give the choice, but you pass complete responsibility for locking. It's part of the contract for using func_2.
You could even assert if a lock is set on the object, to prevent mistakes (although teh check wold be limited because you would not necessarily be in position to verivy who owns the lock).
4.Reconsider your design:
If you need to ensure in func_2 that the object is locked, it means that you have a critical section therein that you must protect. There are chances that both functions need to lock because they perform some lower level operations on obj and need to prevent data races on an instable state of the object.
I'd strongly advidse to look if it would be feasible to extract these lower-level routines from both func and func_2, and encapsulated them in simpler primitive functions on obj. This approach could also contribute to locking for shorter sequences thus increasing opportunity for real concurrency.
Ok, just as another follow-up. I recently read the API documentation of glib, in particular the section about message-passing queues. I found that most functions operating on these queues come in two variants, named function and function_unlocked. The idea is that if a programmer wants to execute a single operation, like popping from the queue this can be done using g_async_queue_pop(). The function automatically takes care of the locking/unlocking of the queue. However, if the programmer wants to for instance pop two elements without interruption, the following sequence may be used:
GAsyncQueue *queue = g_async_queue_new();
// ...
g_async_queue_lock(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_unlock(queue);
This resembles my third approach. It is also the case that assumptions regarding the state of certain locks are made, they are required by the API and required to be documented.

Race condition: One thread creates static object, another thread uses it before it is finished initializing. How to handle?

I have several places in my code where a function static object is created once, and then used (copied) any time that function is called. One of these functions can be called from any thread. The function doesn't access any shared state other than this static object.
When thread 1 calls the function for the first time, the object is created and initialized. However, (by a stroke of luck) I have a repeatable case where the program switches to thread 2 and calls the same function before initialization is finished. The object is assigned, and used, with bad data!
I'm not sure how to handle this. I'm using critical sections in the initialization code, but that's not even the problem. This object is being used before being initialized in the first place.
I tried making this thread local using __declspec(thread), but that doesn't work for objects, apparently.
I could just surround the whole thing with a critical section, and maybe that's the best solution, but I'm concerned about problems like this cropping up in other parts of the code- it'd be nice to have a general solution.
If you are on Windows you could use the InitOnceExecuteOnce API. More details can be found in this Raymond Chen post. Also look at the more generic std::call_once
Couldn't you use a semaphore on the object, setting the semaphore to be already set to 1 when the object is created, and then decrementing it to zero when the object is initialized (and ready for use).
Just need to keep an eye out for resource starvation though.

What is the most efficient implementation of a java like object monitor in C++?

In Java each object has a synchronisation monitor. So i guess the implementation is pretty condensed in term of memory usage and hopefully fast as well.
When porting this to C++ what whould be the best implementation for it. I think that there must be something better then "pthread_mutex_init" or is the object overhead in java really so high?
Edit: i just checked that pthread_mutex_t on Linux i386 is 24 bytes large. Thats huge if i have to reserve this space for each object.
In a sense it's worse than pthread_mutex_init, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.
In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)
One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.
In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.
The Sun Hotspot JVM implements thin locks using compare and swap. If an object is locked, then the waiting thread wait on the monitor of thread which locked the object. This means you only need one heavy lock per thread.
I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.
I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:
Check the hidden syncblk_index field of our object for 0. If it's not 0, lock it and proceed, otherwise...
Create a new syncblk in global table, get the index for it (global locks are acquired/released here as needed).
Compare-and-exchange to write it into object itself.
If previous value was 0 (assume that 0 is not a valid index, and is the initial value for the hidden syncblk_index field of our objects), our syncblk creation was not contested. Lock on it and proceed.
If previous value was not 0, then someone else had already created a syncblk and associated it with the object while we were creating ours, and we have the index of that syncblk now. Dispose the one we've just created, and lock on the one that we've obtained.
Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.
Surely you don't need such a monitor for every object!
When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.
Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.
I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.