Coroutines and handling async in clojure(script) best practices? - clojure

There seems to be a myriad of implementations for 'coroutines' or asynchronous logic in clojure, many of the talks by Rich Hickey and other potential authorities on the matter are from almost a decade ago and I'm trying to find out what is the latest and greatest, best practice way to handle this problem.
My favorite abstraction for this type of thing is lua coroutines, but I think these may be a strictly imperative style of doing things, and I'm a little confused as to what the functional way is instead.
In lua though it's really simple and easy with coroutines to:
A) Non-busy wait for X seconds.
B) Non-busy wait for a variable or function to be a specific value, such as true
A can probably be achieved using setTimeout, but B can't really, at least I don't know how. I'm also not sure setTimeout is the best practice for these types of problems?

In a 2013 blog post, Rich Hickey describes the motivations for clojure.core.async. While the JVM has some applications, the primary motive was to give the illusion of threads to the single-threaded Javascript environment.
The "simulated multithreading" provided by clojure.core.async is not as robust as using actual JVM threads (especially when Exceptions/Errors occur), so it is of limited use for JVM Clojure. This will be even more true when Java virtual threads become a reality.
So if you are in ClojureScript, clojure.core.async is much better than nothing (i.e. callback hell). However, even JS is contemplating a multithreading model via WebAssembly, so an alternative to clojure.core.async could exist for ClojureScript in the future.

Related

what is the relationship among different approaches of F# concurrency

I'm recently learn F# asynchronous workflows, which is an important feature of F# concurrency. What confused me is that how many approaches to write concurrent code in F#? I read Except F#, and some blog about F# concurrency, I know things like background workers; IAsyncResult; If programming on local machine, there is shard-memory concurrency in F#; If programming on distributed system, there is message-passing concurrency. But I really not sure what is the relationship between these techniques, and how to classify them. I understand it is quite a "big" question cannot be answer with one or two sentences, so I would definitely appreciate if anyone can give me specific answer or recommend me some useful references.
I'm also rather new to F#, so I hope more answers come to complement this one :)
The first thing is you need to distinguish between .NET classes (which can be used from any .NET language) and F# unique ways to deal with asynchronous operations. In the first case as you mention and among others, you have:
System.ComponentModel.BackgroundWorker: This was used mainly in the first .NET versions with Windows Forms and it's not recommended anymore.
System.IAsyncResult: This is also an old .NET interface implemented by several classes (also Task) but I don't usually use it directly.
Windows.Foundation.IAsyncOperation: Another interface but used only in Windows Store apps. Most of the times you translate it directly to Task, so you don't have to worry too much about it.
System.Threading.Tasks.Task: This is the recommended way now to handle .NET asynchronous and parallel (with the Parallel Task Library) operations. It's the hidden force behind C# async/await keywords, which are just syntactic sugar to pass continuations to Tasks.
So now with F# unique ways: Asynchronous Workflows and MailboxProcessor. It can roughly be said the former corresponds to parallelism while the latter deals with concurrency.
Asynchronous Workflows: This is just a computation expression (aka monad) which happens to deal with asynchrony: operations that run in the background to prevent blocking the UI thread or parallel operations to get the maximum performance in multi-core systems.
It's more or less the equivalent to C# async/await but we F# fans like to think it's a more elegant solution because it uses a more generic and flexible mechanism (computation expressions) which can be adapted for example to asynchronous sequences, events or even Javascript callbacks. It has also other advantages as Thomas Petricek explains here.
Within an asynchronous workflow most of the time you'll be using the methods in Control.Async or the extensions to .NET classes (like WebRequest.AsyncGetResponse) from the F# Core Library. If necessary, you can also interact directly with .NET Tasks (Async.AwaitTask and Async.StartAsTask) or even easily create your own async operations with Async.StartWithContinuations.
To learn more about asynchronous workflows you can consult the MSDN documentation, the magnificent Scott Wlaschin's site, Tomas Petricek's blog or the F# Wikibook.
Control.MailboxProcessor: Designed to deal with concurrency, that is, several processes running at the same time which usually need to share some information. The traditional .NET way to prevent memory corruption when several threads try to write a variable at the same time was the lock statement. Besides the fact that functional style prefers to use immutable values, memory locks are complicated to use properly and can also have a high performance penalty. So instead of this, MailboxProcessor uses an Erlang-like message-based (or actor-based) approach to concurrency.
I have not used MailboxProcessor myself that much, but for more info you can check Scott Wlaschin's site or the F# Wikibook.
I hope this helps! If someone sees something not completely correct in this answer, please feel free to edit it.
Cheers!

Common Lisp Parallel Programming

I want to implement my particle filtering algorithm in parallel in Common Lisp. Particle Filtering and sampling can be parallelized and I want to do this for my 4-core machine. My question is whether programming in parallel is feasible in CL or not and if it is feasible are there any good readings, tutorials about getting started to parallel computing in CL.
Definitely feasible!
The Bordeaux Threads project provides thread primitives for a number of implementations; I would suggest using it instead of SBCL's implementation-specific primitives (especially if you aren't on SBCL!).
The thread primitives are provided by bt are, however, quite primitive. I've used and enjoyed Eager Future2 which builds on bt to provide concurrency features using futures. You can create futures that are computed lazily, eagerly (immediately), or speculatively. The speculative futures are computed by a thread pool whose size can be customized.
I started a little project to provide parallel versions of CL functions using EF2, but it's only about three functions so far, so it won't be of much use to anyone. I do of course welcome other coders to hack on it and submit pull requests, and I hope to do more work on it in the future.
There are many other libraries listed on Cliki that I haven't tried myself.
As far as tutorials, I don't know of any, but the concurrency features provided are found in other languages as well and good algorithms and practices are not generally language-specific.
If you're interested in reading a book, I recommend The Concurrent C Programming Language. The authors describe a new programming language, based on C, with concurrency as a language feature. Of course, due to the nature of CL, it would likely be possible to implement these features without resorting to creating a new compiler. In my opinion the book presents excellent concurrency concepts, and addresses many of the problems you may encounter or fail to consider in writing concurrent programs.
SBCL has some multithreading support. It is too low level and, to my knowledge, does not include any parallel algorithms. It has just the posibility of creating threads that execute some lambda function and test afterwards if the thread has finished (joining it). I used that support to generate my blog pages with great speedup (each page or set of pages in a different thread). You can see the code here:
https://github.com/dsevilla/functional-mind-blog/blob/master/blog/process.lisp
For eample, generating a thread for each page was something like:
#+sbcl
(defun generate-post-pages ()
(map nil
#'(lambda (post)
(make-thread (lambda () (page-generation-function post))))
*posts*))
You can also join-thread, and have mutexes, etc. You can read the documentation here: SBCL Threading. It is too low-level, though. You'll end missing the fantastic features of Clojure for concurrency...
Check out bordeaux threads if you're looking for a single POSIX-threads-style interface to multi-threading primitives for different Lisps.
If I were looking for a reliable free Lisp implementation, I'd start with CCL and then try SBCL. I use CCL for almost all of my testing and SBCL and LispWorks for the remainder.
Sedach's futures library should provide a higher level interface. There are also some other contributions from various users in SBCL's contrib directory.
This coming from someone who has used neither bordeaux-threads nor Sedach's futures library and has written his own version of both of those. I could send you my implementation, but these two packages are also supposed to be good, and they're probably a better starting point.
LispWorks 6 comes with a nice set of primitives for concurrent programming.
Note though that to my knowledge none of the usual Common Lisp implementations has a concurrent Garbage Collector.
Documentation for LispWorks 6 and Multiprocessing
The MP Package
Multiprocessing

Concurrency model: Erlang vs Clojure

We are going to write a concurrent program using Clojure, which is going to extract keywords from a huge amount of incoming mail which will be cross-checked with a database.
One of my teammates has suggested to use Erlang to write this program.
Here I want to note something that I am new to functional programming so I am in a little doubt whether clojure is a good choice for writing this program, or Erlang is more suitable.
Do you really mean concurrent or distributed?
If you mean concurrent (multi-threaded, multi-core etc.), then I'd say Clojure is the natural solution.
Clojure's STM model is perfectly designed for multi-core concurrency since it is very efficient at storing and managing shared state between threads. If you want to understand more, well worth looking at this excellent video.
Clojure STM allows safe mutation of data by concurrent threads. Erlang sidesteps this problem by making everything immutable, which is fine in itself but doesn't help when you genuinely need shared mutable state. If you want shared mutable state in Erlang, you have to implement it with a set of message interactions which is neither efficient nor convenient (that's the price of a nothing shared model....)
You will get inherently better performance with Clojure if you are in a concurrent setting in a large machine, since Clojure doesn't rely on message passing and hence communication between threads can be much more efficient.
If you mean distributed (i.e. many different machines sharing work over a network which are effectively running as isolated processes) then I'd say Erlang is the more natural solution:
Erlang's immutable, nothing-shared, message passing style forces you to write code in a way that can be distributed. So idiomatic Erlang automatically can be distributed across multiple machines and run in a distributed, fault-tolerant setting.
Erlang is therefore very well optimised for this use case, so would be the natural choice and would certainly be the quickest to get working.
Clojure could do it as well, but you will need to do much more work yourself (i.e. you'd either need to implement or choose some form of distributed computing framework) - Clojure does not currently come with such a framework by default.
In the long term, I hope that Clojure develops a distributed computing framework that matches Erlang - then you can have the best of both worlds!
The two languages and runtimes take different approaches to concurrency:
Erlang structures programs as many lightweight processes communicating between one another. In this case, you will probably have a master process sending jobs and data to many workers and more processes to handle the resulting data.
Clojure favors a design where several threads share data and state using common data structures. It sounds particularly suitable for cases where many threads access the same data (read-only) and share little mutable state.
You need to analyze your application to determine which model suits you best. This may also depend on the external tools you use -- for example, the ability of the database to handle concurrent requests.
Another practical consideration is that clojure runs on the JVM where many open source libraries are available.
Clojure is Lisp running on the Java JVM. Erlang is designed from the ground up to be highly fault tolerant and concurrent.
I believe the task is doable with either of these languages and many others as well. Your experience will depend on how well you understand the problem and how well you know the language. If you are new to both, I'd say the problem will be challenging no matter which one you choose.
Have you thought about something like Lucene/Solr? It's great software for indexing and searching documents. I don't know what "cross checking" means for your context, but this might be a good solution to consider.
My approach would be to write a simple test in each language and test the performance of each one. Both languages are somewhat different to C style languages and if you aren't used to them (and you don't have a team that is used to them) you may end up with a maintenance nightmare.
I'd also look at using something like Groovy 1.8. Groovy now includes GPars to enable parallel computing. String and file manipulation in Groovy is very easy indeed.
It depends what you mean by huge.
Strings in erlang are painful..
but:
If huge means tens of distributed machines, than go with erlang and write workers in text friendly languages (python?, perl?). You will have distributed layer on the top with highly concurrent local workers. Each worker would be represented by erlang process. If you need more performance, rewrite your worker into C. In Erlang it is super easy to talk to another languages.
If huge still means one strong machine go with JVM. It is not huge then.
If huge is hundreds of machines, I think you will need something stronger google-like (bigtable, map/reduce) probably on C++ stack. Erlang still OK, however you will need good devs to code it.

Erlang style concurrency in the D programming language

I think Erlang-style concurrency is the answer to exponential growth of core count. You can kind of fake it with other main stream languages. But the solutions always leave me wanting. I am not willing to give up multi-paradigm programming (C++/D) to switch to Erlang's draconian syntax.
What is Erlang-style concurrency:
From one of the language authors(What is Erlang's concurrency model actually ?):
Lightweight concurrency.
Cheap to create threads and cheap to maintain insane numbers.
Asynchronous communication.
Threads only communicate via messages.
Error handling.
Process isolation.
Or from an informed blogger (What is Erlang-Style Concurrency?):
Fast process creation/destruction
Ability to support >> 10 000 concurrent processes with largely unchanged characteristics.
Fast asynchronous message passing.
Copying message-passing semantics (share-nothing concurrency).
Process monitoring.
Selective message reception.
I think D's message passing can accomplish most of these features. The ones I wonder about are ">>10,000 concurrent processes(threads)" and "fast process creation/destruction".
How well does D handle these requirements?
I think that to support them correctly you'd have to use green threads. Can D's message passing features be used with green threads library?
Storage is thread-local by default in D, so nothing is shared between threads unless it is specifically marked as shared. If you mark a variable as shared, you can then use the traditional mutexes and conditions as well as synchronized objects and the like to deal with concurrency. However, the preferred means of communicating between threads is to use the message passing facilities in std.concurrency and let all data stay thread-local, only using shared when you must. All objects passed between threads using std.concurrency must either be passed by value or be immutable, so no sharing occurs there and it is completely thread-safe. However, it can currently be a bit of a pain to get an immutable reference type which isn't an array (idup generally makes it easy for arrays), so it can be a bit annoying to pass anything other than value types or arrays (though hopefully that situation improves soon as compiler and standard library bugs relating to const and immutable get fixed and more code is made const-correct).
Now, while message passing in D will definitely result in cleaner, safer code than what you'd get in languages like C++ or Java, it is built on top of normal, C threads (e.g. Linux uses pthreads), so it does not have the kind of light-weight threads that Erlang does, and so dealing with multiple threads is not going to be as efficient as Erlang.
Of course, I don't see any reason why a more efficient thread system could not be written using D, at which point you might be able to get thread efficiency similar to that of Erlang, and it could presumably use an API similar to that of std.concurrency, but all of D's standard threading stuff is built on top of normal, C threads, so you'd have to do all of that yourself, and depending on how you implemented it and depending on how exactly the thread-local/shared stuff is dealt with by the compiler and druntime, it could be difficult to get the type system to enforce that everything be thread-local with your "green" threads. I'm afraid that I don't know enough about exactly how shared is implemented or how "green" threads work to know for sure.
Regardless, D's message passing system will certainly result in dealing with threads being more pleasant than C++ or even Java, but it's not designed to be streamlined in the same way that Erlang is. D is a general purpose systems language, not a language specifically designed to use threads for everything and thus to use them absolutely as efficiently as possible. A large portion of D's standard facilities are built on top of C, so a lot of its efficiency characteristics will be similar to those of C.
This functionality is frequently used in combination with async I/O to efficiently communicate with external sources of data as well. The vibe.d framework seems to offer both the many-fibers-on-a-few-OS-threads threading model and async I/O libraries (in addition to a whole bunch of web application libraries and project management tools).
As an unrelated side note, it's pretty freaking cool that D is both low-level enough that you could write this framework in it and high-level enough to be a compelling language to write your web applications in on top of the framework. Other popular languages with similar frameworks (node.js, Ruby's EventMachine, coroutines in Python and Go) are unable to compete with D on low-level systems coding. Other popular languages with similar systems programming facilities (C, C++) can't compete on high-level application coding.
I'm new to D, but I gotta say, I like what I see.
From whatever little I know about D: its message passing infrastructure is built on top its threading facilities. If the core threading library is a wrapper on OS threads, there is little chance that concurrency in D will reach the magnitude (>> 10000) of Erlang. Moreover D do not enforce immutability on objects, so it is easy to mess things up. So, Erlang is the best choice for heavy concurrency. Probably you can write the concurrency stuff in Erlang and the rest of the project in D. Still, it is possible to have efficient green threads in C like languages (C++, D etc) - have a look at Protothreads and ZeroMQ. You can implement very efficient messaging frameworks using these, and calling them via a C shim or directly from D.

can one make concurrent scalable reliable programs in C as in erlang?

a theoretical question. After reading Armstrongs 'programming erlang' book I was wondering the following:
It will take some time to learn Erlang. Let alone master it. It really is fundamentally different in a lot of respects.
So my question: Is it possible to write 'like erlang' or with some 'erlang like framework', which given that you take care not to create functions with sideffects, you can create scaleable reliable apps as well as in Erlang? Maybe with the same msgs sending, loads of 'mini processes' paradigm.
The advantage would be to not throw all your accumulated C/C++ knowledge over the fence.
Any thoughts about this would be welcome
Yes, it is possible, but...
Probably the best answer for this question is given by Robert Virding’s First Rule:
“Any sufficiently complicated
concurrent program in another language
contains an ad hoc,
informally-specified, bug-ridden, slow
implementation of half of Erlang.”
Very good rule is use the right tool for the task. Erlang excels in concurrency and reliability. C/C++ was not designed with these properties in mind.
If you don't want to throw away your C/C++ knowledge and experience and your project allows this kind of division, good approach is to create a mixed solution. Write concurrent, communication and error handling code in Erlang, then add C/C++ parts, which will do CPU and IO bound stuff.
You clearly can - the Erlang/OTP system is largely written in C (and Erlang). The question is 'why would you want to?'
In 'ye olde days' people used to write their own operating system - but why would you want to?
If you elect to use an operating system your unwritten software has certain properties - it can persist to hard disk, it can speak to a network, it can draw on screens, it can run from the command line, it can be invoked in batch mode, etc, etc...
The Erlang/OTP system is 1.5M lines of code which has been demonstrated to give 99.9999999% uptime in large systems (the UK phone system) - that's 31ms downtime a year.
With Erlang/OTP your unwritten software has high reliability, it can hot-swap itself, your unwritten application can failover when a physical computer dies.
Why would you want to rewrite that functionality?
I would break this into 2 questions
Can you write concurrent, scalable C++ applications
Yes. It's certainly possible to create the low level constructs needed in order to achieve this.
Would you want to write concurrent, scalable, C++ applications
Perhaps. But if I was going for a highly concurrent application, I would choose a language that was either designed to fill that void or easily lent itself to doing so (Erlang, F# and possibly C#).
C++ was not designed to build highly concurrent applications. But it can certainly be tweaked into doing so. The cost might be higher than you expect though once you factor in memory management.
Yes, but you will be doing some extra work.
Regarding side effects, consider how the .net/plinq team is approaching. Plinq won't be able to enforce you hand it stuff with no side effects, but it will assume you do so and play by its rules so we get to use a simpler api. Even if the language doesn't have built-in support for it, it will still simplify things as you can break the operations more easily.
What I can do in one Turing complete language I can do in any other Turing complete language.
So I interpret your question to read, is it as easy to write a reliable and scalable application in C++ as it is in Erlang?
The answer to that is highly subjective. For me it is easier to write it in C++ for the following reasons:
I have already done it in C++ (at least three times).
I don't know Erlang.
I have read a great deal about Stackless Python, which feels to me like a highly concurrent message based cooperative multitasking system in python, but of course python is written on top of C.
Having said that. If you already know both languages, and you have the problem well defined, you can then make the best choice based on all the information you have at hand.
the main 'problem' with C (or C++) for writing reliable and easy to extend programs is that in C you can do anything. so, the first step would be to write a simple framework that restricts just a bit. most good programmers do that anyway.
in this case, the restrictions would be mostly to make it easy to define a 'process' within whatever level of isolation you want. fork() has a reputation of being slow, and threads also need significant time to spawn, so you might want to use a cooperative multitasking, which can be far more efficient, and you could even make it preemptive (i think that's what Erlang does). to get multi-core efficiency, set a pool of threads and make all of them complete to run the tasks.
another important part would be to create an appropriate library of immutable data structures, so that using them (instead of the standard lib) your functions would be (mostly) side-effect-free.
then it's just a matter of setting a good API for message passing and futures... not easy, but at least it doesn't seem like changing the language itself.