Clojure core.async put! versus go block - clojure

I've read this great article about core.async here:
http://www.core-async.info/reference/primitives
I'm struggling to understand the internal mechanic of put! and go. I understand that:
put! is asynchronous and can accept a callback. That works well in simple scenarios, but you can end in a callback hell.
go fixes the callback hell, and allows to write asynchronous code in a synchronous style.
go leverages a lightweight thread-pool, and leverages parking to enable concurrency.
go uses a finite state-machine
I don't understand:
How does put! achieve asynchrony? Does it also uses a thread-pool?
Does put! also uses parking?
What is the role of the finite state-machine in the go block? Is it what enables parking?
Should I always try to use put! rather than go because it is cheaper? In that case, does that mean that put! achieve the exact same concurrency goodness as go, and that go is only used when I want to reason about complex asynchronous code?
Thanks a lot for shedding the lights on those mysteries.

If you want to understand how core.async channels work, there's no better source than Rich Hickey's EuroClojure 2014 presentation: Implementation details of core.async channels.
As for your specific questions:
If put! is not accepted immediately, it places a pending put (the value to be put on the channel + the put! callback) on a queue internal to the channel. Note that an exception will be thrown if there is no room in the queue (max capacity is currently fixed at 1024).
The callback will be called on a pooled thread if (1) the put is not immediately accepted or (2) an explicit false is passed in as a final argument to the put! call (this argument is called on-caller?, see put!'s docstring for details).
"Parking", in the context of go blocks, refers to suspending execution of a go block's state machine by recording certain details of its current state and saving them inside a channel, or possibly several channels, so that it can be restarted later. (Note that this arrangement means that if all the channels holding references to a suspended go block are GC'd, the go block itself can also be GC'd.) In other contexts it similarly refers to putting a thread of control in a state of suspended animation. put! is just a function (well, it's backed by a protocol method, but then that is just a protocol method), so the concept doesn't apply.
Yes. It basically steps through the code in the go block, possibly suspending it when control reaches certain "custom terminals" (<!, >!, alt!).
Not necessarily, you should first consider the risk of overflowing the internal queue (see point 1 above).

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

How are golang select statements implemented?

In particular, I have some blocking queues in C++, and I want to wait until any one of them has some item I can pop.
The only mechanism I can think of is to spawn a separate thread for each queue that pops from its input queue and feeds into a master queue that the original thread can wait on.
It seems kind of resource heavy to spawn N new threads and then kill them all every time I want to pop from a group of queues.
Does Golang implement some more elegant mechanism that I might be able to implement in my own C++ code?
I wouldn't necessarily say that Go's select implementation is elegant, but I think it's beautiful in its own way and it's fairly optimized.
it special-handles selects with a single non-default case
it permutes the order in which cases are evaluated in order to avoid deterministic starvation
it does an optimistic first pass over the cases looking for one that's already satisfied
it enqueues on the internal sender/receiver queues of each channel using many internal, known only to the runtime mechanisms
it uses sudogs which are like lightweight goroutine references (there can be many sudogs for the same goroutine) that allow quick jumping into the goroutine stack
it uses the scheduler's gopark mechanism to block itself which allows efficient unparking on signal
when signalled and unparked, it immediately goes into the triggered case handler function by manipulating the select goroutine's program counter
There's no single overarching groundbreaking idea in the implementation, but you would really appreciate how each step was carefully tinkered with so that it's fast, efficient and well integrated with concept of channels. Because of that, it's not very easy to reimplement Go's select statement in another language, unless you at least have the chan construct first.
You can take a look at the reimplementations available in other languages, where the idea was redone with various degrees of similarity and effectiveness. If I had to reimplement select from scratch in another language, I would probably first try a single shared semaphore and, in case that didn't work, switch to a cruder, sleep-a-little-then-check-in-random-order strategy.
Golang's select statement is inspired from the C select function (see the GNU libc documentation), that is used for waiting I/O on a set of file descriptors. If your queues communicate using a socket or a pipe, you may use it.

What are the tradeoffs of the ways to do work in a clojure core.async go-loop?

As I write more core.async code, a very common pattern that emerges is a go-loop that alts over a sequence of channels and does some work in response to a message, e.g.:
(go-loop [state {}]
(let [[value task] (alts! tasks)]
...work...
(recur state))
I don't feel like I understand the tradeoffs of the various ways I can actually do the work though, so I thought I'd try to explore them here.
Inline or by calling a function: this blocks the loop from continuing until the work is complete. Since it's in a go block, one wouldn't want to do I/O or locking operations.
>! a message to a channel monitored by a worker: if the channel is full, this blocks the loop by parking until the channel has capacity. This allows the thread to do other work and allows back pressure.
>!! a message: if the channel is full, this blocks by sleeping the thread running the go loop. This is probably undesirable because go threads are a strictly finite resource.
>! a message within another go block: this will succeed nearly immediately unless there are no go threads available. Conversely, if the channel is full and is being consumed slowly, this could starve the system of go threads in short order.
>!! a message with a thread block: similar to the go block, but consuming system threads instead of go threads, so the upper bound is probably higher
puts! a message: it's unclear what the tradeoffs are
call the work function in a future: gives the work to a thread from the clojure agent pool to do, allows the go loop to continue. If the input rate exceeds the output rate, this grows the agent pool queue without bound.
Is this summary correct and comprehensive?
If the work to be done is entirely CPU-bound, then I would probably do it inline in the go block, unless it's an operation that may take a long time and I want the go block to continue responding to other messages.
In general, any work which doesn't block, sleep, or do I/O can be safely put in a go block without having a major impact on the throughput of the system.
You can use >! to submit work to a worker or pool of workers. I would almost never use >!! in a go block because it can block one of the finite number of threads allocated to running go blocks.
When you need to do I/O or a potentially long-running computation, use a thread instead of a go. This is very similar to future — it creates a real thread — but it returns a channel like go.
put! is a lower-level operation generally used at the "boundaries" of core.async to connect it to conventional callback-based interfaces. There's rarely any reason to use put! inside a go.
core.async can support fine-grained control over how threads are created. I demonstrated a few possibilities in a blog post, Parallel Processing with core.async.

Periodically call a C function without manually creating a thread

I have implemented a WebSocket handler in C++ and I need to send ping messages once in a while. However, I don't want to start one thread per socket/one global poll thread which only calls the ping function but instead use some OS functionality to call my timer function. On Windows, there is SetTimer but that requires a working message loop (which I don't have.) On Linux there is timer_create, which looks better.
Is there some portable, low-overhead method to get a function called periodically, ideally with some custom context? I.e. something like settimer (const int millisecond, const void* context, void (*callback)(const void*))?
[Edit] Just to make this a bit clearer: I don't want to have to manage additional threads. On Windows, I guess using CreateThreadpoolTimer on the system thread pool will do the trick, but I'm curious to hear if there is a simpler solution and how to port this over to Linux.
If you are intending to go cross-platform, I would suggest you use a cross platform event library like libevent.
libev is newer, however currently has weak Win32 support.
If you use sockets, you can use select, to wait sockets events with timeout,
and in this loop calc time and call callback in suitable time.
If you are looking for a timer that will not require an additional thread, let you do your work transparently and then call the timer function at the appropriate time in the same thread by pre-emptively interrupting your application, then there is no such portable thing.
The first reason is that it's downright dangerous. That's like writing a multi-threaded application with absolutely no synchronization. The second reason is that it is extremely difficult to have good semantics in multi-threaded applications. Which thread should execute the timer callback?
If you're writing a web-socket handler, you are probably already writing a select()-based loop. If so, then you can just use select() with a short timeout and check the different connections for which you need to ping each peer.
Whenever you have asynchronous events, you should have an event loop. This doesn't need to be some system default one, like Windows' message loop. You can create your own. But you should be using it.
The whole point about event-based programming is that you are decoupling your code handling to deal with well-defined functional fragments based on these asynchronous events. Without an event loop, you are condemning yourself to interleaving code that get's input and produces output based on poorly defined "states" that are just fragments of procedural code.
Without a well-defined separation of states using an event-based design, code quickly becomes unmanageable. Because code pauses inside procedures to do input tasks, you have lifetimes of objects that will not span entire procedure scopes, and you will begin to write if (nullptr == xx) in various places that access objects created or destroyed based on events. Dispatch becomes comnbinatorially complex because you have different events expected at each input point and no abstraction.
However, simply using an event loop and dispatch to state machines, you've decreased handling complexity to basic management of handlers (O(n) handlers versus O(mn) branch statements with n types of events and m states). You decouple handling but still allow for functionality to change depending on state. But now these states are well-defined using state classes. And new states can be added if the requirements of the product change.
I'm just saying, stop trying to avoid an event loop. It's a software pattern for very important reasons, all of which have to do with producing professional, reusable, scalable code. Use Boost.ASIO or some other framework for cross platform capabilities. Don't get in the habit of doing it wrong just because you think it will be less of an effort. In the end, even if it's not a professional project that needs maintenance long term, you want to practice making your code professional so you can do something with your skills down the line.

How to keep asynchronous parallel program code manageable (for example in C++)

I am currently working on a server application that needs to control a collection devices over a network. Because of this, we need to do a lot of parallel programming. Over time, I have learned that there are three approaches to communication between processing entities (threads/processes/applications). Regrettably, all three approaches have their disadvantages.
A) You can make a synchronous request (a synchronous function call). In this case, the caller waits until the function is processed and the response has been received. For example:
const bool convertedSuccessfully = Sync_ConvertMovie(params);
The problem is that the caller is idling. Sometimes this is just not an option. For example, if the call was made by the user interface thread, it will seem like the application has blocked until the response arrives, which can take a long time.
B) You can make an asynchronous request and wait for a callback to be made. The client code can continue with whatever needs to be done.
Async_ConvertMovie(params, TheFunctionToCallWhenTheResponseArrives);
This solution has the big disadvantange that the callback function necessarily runs in a separate thread. The problem is now that it is hard to get the response back to the caller. For example, you have clicked a button in a dialog, which called a service asynchronlously, but the dialog has been long closed when the callback arrives.
void TheFunctionToCallWhenTheResponseArrives()
{
//Difficulty 1: how to get to the dialog instance?
//Difficulty 2: how to guarantee in a thread-safe manner that
// the dialog instance is still valid?
}
This in itself is not that big a problem. However, when you want to make more than one of such calls, and they all depend on the response of the previous one, this becomes in my experience unmanageably complex.
C) The last option I see is to make an asynchronous request and keep polling until the response has arrived. In between the has-the-response-arrived-yet checks, you can do something useful. This is the best solution I know of to solve the case in which there is a sequence of asynchronous function calls to make. This is because it has the big advantage that you still have the whole caller context around when the response arrives. Also, the logical sequence of the calls remains reasonably clear. For example:
const CallHandle c1 = Sync_ConvertMovie(sourceFile, destFile);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (!c1.IsSuccessful())
return;
const CallHandle c2 = Sync_CopyFile(destFile, otherLocation);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (c1.IsSuccessful())
//show a success dialog
The problem with this third solution is that you cannot return from the caller's function. This makes it unsuitable if the work you want to do in between has nothing to do at all with the work you are getting done asynchronously. For a long time I am wondering if there is some other possibility to call functions asynchronously, one that doesn't have the downsides of the options listed above. Does anyone have an idea, some clever trick perhaps?
Note: the example given is C++-like pseudocode. However, I think this question equally applies to C# and Java, and probably a lot of other languages.
You could consider an explicit "event loop" or "message loop", not too different from classic approaches such as a select loop for asynchronous network tasks or a message loop for a windowing system. Events that arrive may be dispatched to a callback when appropriate, such as in your example B, but they may also in some cases be tracked differently, for example to cause transactions in a finite state machine. A FSM is a fine way to manage the complexity of an interaction along a protocol that requires many steps, after all!
One approach to systematize these consideration starts with the Reactor design pattern.
Schmidt's ACE body of work is a good starting point for these issues, if you come from a C++ background; Twisted is also quite worthwhile, from a Python background; and I'm sure that similar frameworks and sets of whitepapers exist for, as you say, "a lot of other languages" (the Wikipedia URL I gave does point at Reactor implementations for other languages, besides ACE and Twisted).
I tend to go with B, but instead of calling forth and back, I'd do the entire processing including follow-ups on a separate thread. The main thread can meanwhile update the GUI and either actively wait for the thread to complete (i.e. show a dialog with a progress bar), or just let it do its thing in the background and pick up the notification when it's done. No complexity problems so far, since the entire processing is actually synchronous from the processing thread's point of view. From the GUI's point of view, it's asynchronous.
Adding to that, in .NET it's no problem to switch to the GUI thread. The BackgroundWorker class and the ThreadPool make this easy as well (I used the ThreadPool, if I remember correctly). In Qt, for example, to stay with C++, it's quite easy as well.
I used this approach on our last major application and am very pleased with it.
Like Alex said, look at Proactor and Reactor as documented by Doug Schmidt in Patterns of Software Architecture.
There are concrete implementations of these for different platforms in ACE.