Can I set the stack pointer in LLVM? - llvm

I'm working on a small c++-like language which I'll be compiling to LLVM. One of the things I want to implement is cooperative multitasking; there will be a "yield" operator which will hopefully switch the stack pointer and program counter to the next "thread" in my program.
Is it possible to do this in llvm? Can I set the stack pointer register? If not, is there anything else similar I can do?
Edit: LLVM coroutines (http://llvm.org/docs/Coroutines.html) sound promising, though https://internals.rust-lang.org/t/llvm-coroutines-to-bring-awarness/3708/12 brings up some questions regarding stackful or stackless coroutines. I wonder, can they be used to implement a general yield-like operator?
Edit 2: In c++ boost has something called a "context" which can implement stackful coroutines. Still trying to figure out how they do it though. Anyone know?

Assuming you have the gcd library available: You can easily implement cooperative multitasking by using a semaphore (dispatch_semaphore_t). The semaphore count is set up so that exactly one of your threads can run at the same time. The yield() function signals and immediately locks the semaphore - the signal() wakes up another thread, and the lock stops the thread that called yield.

Related

Implementing asynchronous delays

What would be a smart way to implement something like the following?
// Plain C function for example purposes.
void sleep_async(delay_t delay, void (* callback)(void *), void * data);
That is, a means of asynchronously executing a callback after a delay. POSIX, for example, has a few functions that do something like this, but they are mostly for asynchronous I/O (see this for what I mean). What interests me about those functions how they are executed "as if" on a new thread, according to that manual page, where an implementation may choose to spawn "a single thread...to receive all notifications". I am aware that some may nonetheless choose to spawn a whole thread for each of them, and that stuff like this may require support from the OS itself, so this is just an example.
I already have a couple of ways I could implement this (e.g. priority queue of events sorted by wake time on a timer loop, with no need to start a thread at all), but I am wondering whether there already exists smart[er] or [more] complete implementations of what I want to accomplish. For example, maybe implementations of Task.Delay() from C♯ (and coroutines like it in other language environments) do something smart in minimizing the amount of thread spawning for getting asynchronous delays.
Why am I looking for something like this? As implied by the title, I'm looking for something asynchronous. The above signature is just a simple C example to illustrate roughly what POSIX does. I am implementing some C++20 coroutines for use with co_await and friends, with thread pools and whatnot. Scheduling anything that would end up synchronously waiting on something is probably a bad idea, as it would prevent otherwise free threads from doing any work. Spawning [and potentially immediately detaching] a new thread just to add in an asynchronous delay doesn't seem like a very smart idea, either. My timer loop idea could be okay, but that implies needing a predefined timer granularity, and overhead from the priority queue.
Edit
I neglected to mention any real set of target platforms, as a commenter mentioned. I don't expect to target anything outside the "usual" desktop platforms, so the quirks of embedded development are ignored. The way I plan to use asynchronous delays themselves this way does not necessarily require threading support (everything could just be on a timer loop), but threading will nonetheless be required and used in accord (namely thread pools on which coroutines would be scheduled).
The simple but inefficient way would be to spawn a thread, have it sleep for delay, and then call the callback. This can be done in just a few lines using std::async():
auto delayed_call = std::async(std::launch::async, [&]{
std::this_thread::sleep_for(delay);
callback(data);
});
As mentioned by Thomas Matthews, this requires support for threads. While it's fine for a one-off call, it's not efficient if you have many such delayed calls. Having a priority queue and an event loop or a dedicated thread to handle events in this queue, as you already mentioned, is probably the most efficient way to do it. If you are looking for a library that implements this, then have a look at boost::asio.
As for using C++20 coroutines, I do not think that this will make something like your sleep_async() any easier. However, an event loop could be implemented on top of it.
A smart way? You mean really, really smart? That would be my own implementation, of course. You know about POSIX timers, you probably know about linux timers and the various hacks involving std::thread. But, more seriously, what you require sounds mostly to the tune of something like libeio, or libuv - both of these provide callbacks. It depends on what you can afford in binary size and whether you like the particular abstractions a library offers. The 2 libraries seem to be evolved versions of libevent and libev, libevent being the progenitor of them all.
Creating a std::thread instance involves allocating a stack frame, at the very least, which is by no means cheap.

Is it possible to use fork in modern C++?

Traditional C++ was very straightforward and only a library intended to create threads (like pthread) gave rise to other threads.
Modern C++ is much closer to Java with many functions being thread based, with thread pools ready to run asynchronous jobs, etc. It's much more likely that some library, including the standard library, uses threads to compute asynchronously some function, or sets up the infrastructure to do so even if it isn't used.
In that context, is it ever safe to use functions with global impact like fork?
The answer to this question, like almost everything else in C++, is "it depends".
If we assume there are other threads in the program, and those threads are synchronizing with each other, calling fork is dangerous. This is because, fork does not wait for all threads to be a synchronization point (i.e. mutex release) to fork the process. In the forked process, only the thread that called fork will be present, and the others will have been terminated, possibly in the middle of a critical section. This means any memory shared with other threads, that wasn't a std::atomic<int> or similar, is an undefined state.
If your forked process reads from this memory, or indeed expects the other threads to be running, it is likely not going to work reliably. However, most uses of fork actually have effectively no preconditions on program state. That is because the most common thing to do is to immediately call execv or similar to spawn a subprocess. In this case your entire process is kinda "replaced" by some new process, and all memory from your old process is discarded.
tl;dr - Calling fork may not be safe in multithreaded programs. Sometimes it is safe; like if no threads have spawned yet, or evecv is called immediately. If you are using fork for something else, consider using a thread instead.
See the fork man page and this helpful blog post for the nitty-gritty.
To add to peteigel's answer, my advice is - if you want to fork, do it very early, before any other threads than the main thread are started.
In general, anything you can do in C, you can do in C++, since C++, especially on Linux with clang or gcc extensions, is pretty darn close to a perfect superset of C. Of course, when there are good portable APIs in std C++, use them. The canonical example is preferring std::thread over pthreads C API.
One caveat is pthread_cancel, which must be avoided on C++ due to exceptions. See e.g. pthread cancel harmful on C++.
Here is another link that explains the problem:
pthread_cancel while in destructor
In general, C++ cleanup handling is in general easier and more elegant than C, since RAII is part and parcel of C++ culture, and C does not have destructors.

learning threads on linux

Linux is a new platform to me. I've coded on Windows in c++ for a number of years and have become comfortable with multithreading on that platform.
Along comes C++11 at a time when I need to learn c++ on the linux platform.
Linux appears to use pthreads for the most part - okay there's also boost::threads and QT have their own threads too. But with C++11 comes std::thread, a whole new (cross platform and C++ standard) way to do threads.
So I guess I'll have to learn pthreads and std::threads. Ultimately, std::thread seems more important, but there's a lot of legacy code out there, so I'll have to know both.
For thread synchronization on windows, I would use WaitForMultipleObjects to wait for a number of tasks to complete before continuing with further work.
Does a similar synchronization mechanism exist for pthreads? std::threads?
I've had a look at pthread_join, and it seems to have the facility to only wait on one thread at a time. Am I missing another pthread call maybe?
std::thread is boost::thread accepted into C++11 with some extras. My understanding is that if boost::thread gets replaced in code with std::thread it should still compile and work.
boost::thread is based on pthreads design, providing thin C++ wrappers over thread, mutex and condition variables. Thread cancellation though was left outside the scope of C++11, since there was no agreement how it should work in C++.
So, by learning pthreads you also learn std::thread concepts. std::thread adds mostly syntax sugar and convenience functions on top of pthreads C API.
With regards to WaitForMultipleObjects(), neither pthreads nor std::thread provide anything similar to its bWaitAll=FALSE mode, however, it's routinely simulated using pipes and select() on UNIX, or more modern eventfd() and epoll() on Linux. bWaitAll=TRUE mode can be simulated by waiting on all tasks in turn, since it doesn't proceed until all objects are ready anyway.
No, neither pthreads nor C++11 has direct equivalent of WaitForMultipleObjects (i.e. wait for any waitable "handle" type.) pthread_join can only be used to join threads, and only a single, specific thread.
The closest equivalent on posix platforms is to wait for multiple file descriptors using system calls such as select(), poll() or the linux-specific epoll(), but they require you to have a file descriptor to wait on, which is fine for I/O events but requires extra work from you to use them wait for mutexes, condition variables or other synchronisation objects. There are more general event libraries built on top of those system calls, e.g. libevent and libev and Boost ASIO, which support waiting for timers as well as I/O, but still not thread completion, mutex locks etc. with a single function like WaitForMultipleObjects
The alternatives you do have for pthreads and C++11 threads are to wait on different synchronisation types separately. You can wait for timers, wait for threads to complete, wait for mutexes, wait on condition variables, wait for asynchronous results to be ready (std::async in C++11, no direct equivalent in pthreads) ... but there's no call that will allow you to wait a heterogeneous set of those types all at once.
I could give you a really fancy answer but alas, this is where I learned them and it is a good introduction:
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
You use pthread_mutex_t for syncronization and pthread_join probably handles the wait for multiple tasks problem. It works exactly as you would expect.
Based on this, you must call pthread_join for each single thread you have created. Or to use mutexes, if there is a need to synchronize your threads.
Regarding WaitForMultipleObjects, this is generally called a Barrier Sync. Boost has an implementation called barrier. It uses conditional variables to implement it, in posix its a pthread_cond_t
Here is an answer I left recently explaining barrier sync.

Using asynchronous method vs thread wait

I have 2 versions of a function which are available in a C++ library which do the same task. One is a synchronous function, and another is of asynchronous type which allows a callback function to be registered.
Which of the below strategies is preferable for giving a better memory and performance optimization?
Call the synchronous function in a worker thread, and use mutex synchronization to wait until I get the result
Do not create a thread, but call the asynchronous version and get the result in callback
I am aware that worker thread creation in option 1 will cause more overhead. I am wanting to know issues related to overhead caused by thread synchronization objects, and how it compares to overhead caused by asynchronous call. Does the asynchronous version of a function internally spin off a thread and use synchronization object, or does it uses some other technique like directly talk to the kernel?
"Profile, don't speculate." (DJB)
The answer to this question depends on too many things, and there is no general answer. The role of the developer is to be able to make these decisions. If you don't know, try the options and measure. In many cases, the difference won't matter and non-performance concerns will dominate.
"Premature optimisation is the root of all evil, say 97% of the time" (DEK)
Update in response to the question edit:
C++ libraries, in general, don't get to use magic to avoid synchronisation primitives. The asynchronous vs. synchronous interfaces are likely to be wrappers around things you would do anyway. Processing must happen in a context, and if completion is to be signalled to another context, a synchronisation primitive will be necessary to do that.
Of course, there might be other considerations. If your C++ library is talking to some piece of hardware that can do processing, things might be different. But you haven't told us about anything like that.
The answer to this question depends on context you haven't given us, including information about the library interface and the structure of your code.
Use asynchronous function because will probably do what you want to do manually with synchronous one but less error prone.
Asynchronous: Will create a thread, do work, when done -> call callback
Synchronous: Create a event to wait for, Create a thread for work, Wait for event, On thread call sync version , transfer result, signal event.
You might consider that threads each have their own environment so they use more memory than a non threaded solution when all other things are equal.
Depending on your threading library there can also be significant overhead to starting and stopping threads.
If you need interprocess synchronization there can also be a lot of pain debugging threaded code.
If you're comfortable writing non threaded code (i.e. you won't burn a lot of time writing and debugging it) then that might be the best choice.

passing variable to a thread after it already started

i am newbie in C++ and boost.
As part of my master thesis, i wrote a program which simulate a statistical model. During the computation, i use boost::thread to process my "center of mass vector", for saving some computation time. So far so good.
Now, i would like to take each result from the boost::thread (each time one element) and pass it to a running thread, which is going to preform recursive regression.
My questions:
how can i pass my new computed element to the existing thread?
how could i "wake-up" the thread, when i pass the new element?
i would be happy if someone could point me to an existing example.
the simplest possible way is to use std::queue, boost::mutex and boost::conditional_variable. wrap any access to queue by mutex, after pushing to queue call conditional_variable.notify_one(). in consumer thread wait on conditional_variable until any result is ready, then process it.
A proven way to control a thread from another thread is to send messages via a combination of a queue with a conditional variable. Unfortunately, boost::thread doesn't provide a standard solution and there are a couple of tricky things when implementing (possible deadlocks, behaviour when queue is full, use polymorphic messages...)
You should use mutex and/ro semaphore to synchronize your threads and lock variable to achieve thread-safe communication. Just note that all threads in your process share the same memory so you can access the same data, but you have to do it in a thread-safe way.
I'm not sure if boost library implements any threading primitives, but here is a good tutorial about multi-threading programming using POSIX threads - http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html