When to use core.async in Clojure? - clojure

When should I use Clojure's core.async library, what kind of applications need that kinda async thing?
Clojure provides 4 basic mutable models like refs, agents, atoms and thread locals/vars. Can't these mutable references provide in any way what core.async provides with ease?
Could you provide real world use cases for async programming?
How can I gain an understanding of it so that when I see a problem, it clicks and I say "This is the place I should apply core.async"?
Also we can use core.async in ClojureScript which is a single threaded environment, what are the advantages there (besides avoiding callback hell)?

You may wish to read this:
Clojure core.async Channels Introductory blog by Rich Hickey
Mastering Concurrent Processes with core.async Brave Clojure entry
The best use case for core.async is ClojureScript, since it allows you to simulate multi-threaded programming and avoid Callback Hell.
In JVM Clojure, core.async can also by handy where you want a (lightweight) producer-consumer architecure. Of course, you could always use native Java queues for that, as well.

It's important to point out that there are 2 common meanings associated to the word 'async' in programming circles:
asynchronous messaging: Systems in which components send messages without expecting a response from their consumers, and often without even knowing who the consumers are (via queues)
non-blocking (a.k.a event-driven) I/O: Programs structured in such a way that they don't block expensive computational resources (threads, cores, ...) while awaiting a response. There are several approches, with varying levels of abstraction, for dealing with such systems: callback-based APIs (low-level, difficult to manage because based on side-effects), Promise/Futures/Deferreds (representations of values that we don't yet have, more manageable as they are value-based), Green Threads ('logical' threads which emulate ordinary control flow, but are inexpensive)
core.async is very opinionated towards 1 (asynchronous messaging via queues), and provides a macro for implementing Green Threading (the go macro).
From my experience, if non-blocking is all you need, I would personally recommend starting with Manifold, which makes fewer assumptions about your use case, then use core.async for the more advanced use cases where it falls short; note that both libraries interoperate well.

I find it useful for fine-grained, configurable control of side-effect parallelism on the JVM.
e.g. If the following executes a read from Cassandra, returning an async/chan:
(arche.async/execute connection :key {:values {:id "1"}})
Then the following performs a series of executes in parallel, where that parallelism is configurable, and the results are returned in-order.
(async/pipeline-async
n-parallelism
out
#(arche/execute connection :key {:values %1 :channel %2})
(async/to-chan [{:id "1"} {:id "2"} {:id "3"} ... ]))
Probably quite particular to my niche, but you get the idea.
https://github.com/troy-west/arche

Core.async provides building blocks for socket like programming which is useful for coordinating producer/consumer interaction in Clojure. Core.async's lightweight threads (go blocks) let you write imperative style code reading from channels instead of using callbacks in the browser. On the JVM, the lightweight threads let you utilize your full CPU threads.
You can see an example of a full-stack CLJ/CLJS producer/consumer Core.async chat room here: https://github.com/briangorman/hablamos
Another killer feature of core.async is the pipeline feature. Often in data processing you will have the initial processing stage take up most of the CPU time, while later stages e.g. reducing will take up significantly less. With the async pipeline feature, you can split up the processing over channels to add parallelization to your pipeline. Core.async channels work with transducers, so channels play nice with the rest of the language now.

Related

What is the difference between locking and atom/reset!/swap! in Clojure

I was reading some source code and came across locking usage in Clojure. It made me thinking about the atom version. So what are the differences between 2 code snippets, I think that they do the same thing?
(def lock (Object.))
(locking lock
...some operation)
(def state (atom true))
(when #state
(reset! state false)
...some operation
(reset! state true))
Locking (aka synchronization) is only ever needed when multiple threads are changing a piece of mutable state.
The locking macro is a low-level feature that one almost never needs to use in Clojure. It is similar in some ways to a synchronized block in Java.
In clojure, one normally just uses an atom for this purpose. On rare occasions an agent or ref is called for. In even rarer situations, you can use a dynamic Var to get thread-local mutable state.
Internally, a Clojure atom delegates all currency operations to the class java.util.concurrent.atomic.AtomicReference.
Your code snippet shows a misunderstanding of an atom's purpose and operation. Two concurrent threads could process your atom code snipped at the same time, so the attempt does not provide thread safety and would result in bugs and corrupted data.
If you want to explore the really primitive (i.e. Java 1.2) synchronization primitives, see:
The Oracle docs
The book Effective Java (1st edition shows most details, 3rd Ed. defers to higher-level classes)
The book Java Concurrency in Practice This book is so scary it'll turn your hair white. After reading this one, you'll be so afraid of roll-your-own concurrency that you'll never even think about doing it again.
Double-Checked Locking is Broken: A excellent example of how hard it is to get locking right (also here).

Threading vs Task-Based vs Asynchronous Programming

I'm new to this concept. Are these the same or different things? What is the difference? I really like the idea of being able to run two processes at once, for example if I have several large files to load into my program I'd love to load as many of them simultaneously as possible instead of waiting for one at a time. And when working with a large file, such as wav file, it would be great to break it into pieces and do processing on several chunks at once and then put them back together. What do I want to look into to learn how to do this sort of thing?
Edit: Also, I know using more than one core on a multicore processor fits in here somewhere, but apparently asynchronous programming doesn't necessarily mean you are using multiple cores? Why would you do this if you didn't have multiple cores to take advantage of?
They are related but different.
Threading, normally called multi-threading, refers to the use of multiple threads of execution within a single process. This usually refers to the simple case of using a small set of threads each doing different tasks that need to be, or could benefit from, running simultaneously. For example, a GUI application might have one thread draw elements, another thread respond to events like mouse clicks, and another thread do some background processing.
However, when the number of threads, each doing their own thing, is taken to an extreme, we usually start to talk about an Agent-based approach.
The task-based approach refers to a specific strategy in software engineering where, in abstract terms, you dynamically create "tasks" to be accomplished, and these tasks are picked up by a task manager that assigns the tasks to threads that can accomplish them. This is more of a software architectural thing. The advantage here is that the execution of the whole program is a succession of tasks being relayed (task A finished -> trigger task B, when both task B and task C are done -> trigger task D, etc..), instead of having to write a big function or program that executes each task one after the other. This gives flexibility when it is unclear which tasks will take more time than others, and when tasks are only loosely coupled. This is usually implemented with a thread-pool (threads that are waiting to be assigned a task) and some message-passing interface (MPI) to communicate data and task "contracts".
Asynchronous programming does not refer to multi-threaded programming, although the two are very often associated (and work well together). A synchronous program must complete each step before moving on to the next. An asynchronous program starts a step, moves on to other steps that don't require the result of the first step, then checks on the result of the first step when its result is required.
That is, a synchronous program might go a little bit like this: "do this task", "wait until done", "do something with the result", and "move on to something else". By contrast, an asynchronous program might go a little more like this: "I'm gonna start a task, and I'll need the result later, but I don't need it just now", "in the meantime, I'll do something else", "I can't do anything else until I have the result of the first step now, so I'll wait for it, if it isn't ready", and "move on to something else".
Notice that "asynchronous" refers to a very broad concept, that always involves some form of "start some work and tell me when it's done" instead of the traditional "do it now!". This does not require multi-threading, in which case it just becomes a software design choice (which often involves callback functions and things like that to provide "notification" of the asynchronous result). With multiple threads, it becomes more powerful, as you can do various things in parallel while the asynchronous task is working. Taken to the extreme, it can become a more full-blown architecture like a task-based approach (which is one kind of asynchronous programming technique).
I think the thing that you want corresponds more to yet another concept: Parallel Computing (or parallel processing). This approach is more about splitting a large processing task into smaller parts and processing all parts in parallel, and then combining the results. You should look into libraries like OpenMP or OpenCL/CUDA (for GPGPU). That said, you can use multi-threading for parallel processing.
but apparently asynchronous programming doesn't necessarily mean you are using multiple cores?
Asynchronous programming does not necessarily involve anything happening concurrently in multiple threads. It could mean that the OS is doing things on your behalf behind the scenes (and will notify you when that work is finished), like in asynchronous I/O, which happens without you creating any threads. It boils down to being a software design choice.
Why would you do this if you didn't have multiple cores to take advantage of?
If you don't have multiple cores, multi-threading can still improve performance by reusing "waiting time" (e.g., don't "block" the processing waiting on file or network I/O, or waiting on the user to click a mouse button). That means the program can do useful work while waiting on those things. Beyond that, it can provide flexibility in the design and make things seem to run simultaneously, which often makes users happier. Still, you are correct that before multi-core CPUs, there wasn't as much of an incentive to do multi-threading, as the gains often do not justify the overhead.
I think in general, all these are design related rather than language related. Same apply to multicore programming.
To reflect Jim, it's not only the file load scenario. Generally, you need to design the whole software to run concurrently in order to feel the real benefit of multi-threading, task based or asynchronous programming.
Try see things from a grand picture point of view. Understand the over all modelling of a specific example and see how these methodologies are implemented. It'll easy to see the difference and help understand when and where to use which.

What libraries should I use for better OCaml Threading?

I have asked a related question before Why OCaml's threading is considered as `not enough`?
No matter how "bad" ocaml's threading is, I notice some libraries say they can do real threading.
For example, Lwt
Lwt offers a new alternative. It provides very light-weight
cooperative threads; ``launching'' a thread is a very fast operation,
it does not require a new stack, a new process, or anything else.
Moreover context switches are very fast. In fact, it is so easy that
we will launch a thread for every system call. And composing
cooperative threads will allow us to write highly asynchronous
programs.
Also Jane Street's aync_core also provides similar things, if I am right.
But I am quite confused. Do Lwt or aync_core provide threading like Java threading?
If I use them, can I utilise multiple cpu?
In what way, can I get a "real threading" (just like in Java) in OCaml?
Edit
I am still confused.
Let me add a scenario:
I have a server (16 cpu cores) and a server application.
What the server application does are:
It listens to requests
For each request, it starts a computational task (let's say costs 2 minutes to finish)
When each task finishes, the task will either return the result back to the main or just send the result back to client directly
In Java, it is very easy. I create a thread pool, then for each request, I create a thread in that pool. that thread will run the computational task. This is mature in Java and it can utilize the 16 cpu cores. Am I right?
So my question is: can I do the same thing in OCaml?
The example of parallelized server that you cite is one of those embarassingly parallel problem that are well solved with a simple multiprocessing model, using fork. This has been doable in OCaml for decades, and yes, you will an almost linear speedup using all the cores of your machine if you need.
To do that using the simple primitives of the standard library, see this Chapter of the online book "Unix system programming in OCaml" (first released in 2003), and/or this chapter of the online book "Developing Applications with OCaml" (first released in 2000).
You may also want to use higher-level libraries such as Gerd Stolpmann's OCamlnet library mentioned by rafix, which provides a lot of stuff from direct helper for the usual client/server design, to lower-level multiprocess communication libraries; see the documentation.
The library Parmap is also interesting, but maybe for slightly different use case (it's more that you have a large array of data available all at the same time, that you want to process with the same function in parallel): a drop-in remplacement of Array.map or List.map (or fold) that parallelizes computations.
The closest thing you will find to real (preemptive) threading is the built in threading library. By that mean I mean that your programming model will be the same but with 2 important differences:
OCaml's native threads are not lightweight like Java's.
Only a single thread executes at a time, so you cannot take advantage of multiple processes.
This makes OCaml's threads a pretty bad solution to either concurrency or parallelism so in general people avoid using them. But they still do have their uses.
Lwt and Async are very similar and provide you with a different flavour of threading - a cooperative style. Cooperative threads differ from preemptive ones in the fact context switching between threads is explicit in the code and blocking calls are always apparent from the type signature. The cooperative threads provided are very cheap so very well suited for concurrency but again will not help you with parallelilsm (due to the limitations of OCaml's runtime).
See this for a good introduction to cooperative threading: http://janestreet.github.io/guide-async.html
EDIT: for your particular scenario I would use Parmap, if the tasks are so computationally intensive as in your example then the overhead of starting the processes from parmap should be negligible.

Is there a distributed data processing pipeline framework, or a good way to organize one?

I am designing an application that requires of a distributed set of processing workers that need to asynchronously consume and produce data in a specific flow. For example:
Component A fetches pages.
Component B analyzes pages from A.
Component C stores analyzed bits and pieces from B.
There are obviously more than just three components involved.
Further requirements:
Each component needs to be a separate process (or set of processes).
Producers don't know anything about their consumers. In other words, component A just produces data, not knowing which components consume that data.
This is a kind of data flow solved by topology-oriented systems like Storm. While Storm looks good, I'm skeptical; it's a Java system and it's based on Thrift, neither of which I am a fan of.
I am currently leaning towards a pub/sub-style approach which uses AMQP as the data transport, with HTTP as the protocol for data sharing/storage. This means the AMQP queue model becomes a public API — in other words, a consumer needs to know which AMQP host and queue that the producer uses — which I'm not particularly happy about, but it might be worth the compromise.
Another issue with the AMQP approach is that each component will have to have very similar logic for:
Connecting to the queue
Handling connection errors
Serializing/deserializing data into a common format
Running the actual workers (goroutines or forking subprocesses)
Dynamic scaling of workers
Fault tolerance
Node registration
Processing metrics
Queue throttling
Queue prioritization (some workers are less important than others)
…and many other little details that each component will need.
Even if a consumer is logically very simple (think MapReduce jobs, something like splitting text into tokens), there is a lot of boilerplate. Certainly I can do all this myself — I am very familiar with AMQP and queues and everything else — and wrap all this up in a common package shared by all the components, but then I am already on my way to inventing a framework.
Does a good framework exist for this kind of stuff?
Note that I am asking specifically about Go. I want to avoid Hadoop and the whole Java stack.
Edit: Added some points for clarity.
Because Go has CSP channels, I suggest that Go provides a special opportunity to implement a framework for parallelism that is simple, concise, and yet completely general. It should be possible to do rather better than most existing frameworks with rather less code. Java and the JVM can have nothing like this.
It requires just the implementation of channels using configurable TCP transports. This would consist of
a writing channel-end API, including some general specification of the intended server for the reading end
a reading channel-end API, including listening port configuration and support for select
marshalling/unmarshalling glue to transfer data - probably encoding/gob
A success acceptance test of such a framework should be that a program using channels should be divisible across multiple processors and yet retain the same functional behaviour (even if the performance is different).
There are quite a few existing transport-layer networking projects in Go. Notable is ZeroMQ (0MQ) (gozmq, zmq2, zmq3).
I guess you are looking for a message queue, like beanstalkd, RabbitMQ, or ØMQ (pronounced zero-MQ). The essence of all of these tools is that they provide push/receive methods for FIFO (or non-FIFO) queues and some even have pub/sub.
So, one component puts data in a queue and another one reads. This approach is very flexible in adding or removing components and in scaling each of them up or down.
Most of these tools already have libraries for Go (ØMQ is very popular among Gophers) and other languages, so your overhead code is very little. Just import a library and start receiving and pushing messages.
And to decrease this overhead and avoid dependency on a particular API, you can write a thin package of yours which uses one of these message queue systems to provide very simple push/receive calls and use this package in all of your tools.
I understand that you want to avoid Hadoop+Java, but instead of spending time developing your own framework, you may want to have a look at Cascading. It provides a layer of abstraction over underlying MapReduce jobs.
Best summarized on Wikipedia, It [Cascading] follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows reusable ‘pipes’ that perform data analysis processes, where the results are stored in output files or ‘sinks’. Pipes are created independent from the data they will process. Once tied to data sources and sinks, it is called a ‘flow’. These flows can be grouped into a ‘cascade’, and the process scheduler will ensure a given flow does not execute until all its dependencies are satisfied. Pipes and flows can be reused and reordered to support different business needs.
You may also want to have a look at some of their examples, Log Parser, Log Analysis, TF-IDF (especially this flow diagram).

Thread per connection vs Reactor pattern (with a thread pool)?

I want to write a simple multiplayer game as part of my C++ learning project.
So I thought, since I am at it, I would like to do it properly, as opposed to just getting-it-done.
If I understood correctly: Apache uses a Thread-per-connection architecture, while nginx uses an event-loop and then dedicates a worker [x] for the incoming connection. I guess nginx is wiser, since it supports a higher concurrency level. Right?
I have also come across this clever analogy, but I am not sure if it could be applied to my situation. The analogy also seems to be very idealist. I have rarely seen my computer run at 100% CPU (even with a umptillion Chrome tabs open, Photoshop and what-not running simultaneously)
Also, I have come across a SO post (somehow it vanished from my history) where a user asked how many threads they should use, and one of the answers was that it's perfectly acceptable to have around 700, even up to 10,000 threads. This question was related to JVM, though.
So, let's estimate a fictional user-base of around 5,000 users. Which approach should would be the "most concurrent" one?
A reactor pattern running everything in a single thread.
A reactor pattern with a thread-pool (approximately, how big do you suggest the thread pool should be?
Creating a thread per connection and then destroying the thread the connection closes.
I admit option 2 sounds like the best solution to me, but I am very green in all of this, so I might be a bit naive and missing some obvious flaw. Also, it sounds like it could be fairly difficult to implement.
PS: I am considering using POCO C++ Libraries. Suggesting any alternative libraries (like boost) is fine with me. However, many say POCO's library is very clean and easy to understand. So, I would preferably use that one, so I can learn about the hows of what I'm using.
Reactive Applications certainly scale better, when they are written correctly. This means
Never blocking in a reactive thread:
Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
If a tasks could take a long time it must be broken up (cooperative multitasking)
Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.
I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.
If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:
What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?
I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.
Best of luck learning C++ and with which ever choice you make.
Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.
http://www.kegel.com/c10k.html
The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:
http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/
The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.
FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1
I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.
as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.
game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications.
on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.
so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.
so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.