How to convey current thread's bindings to another thread? - concurrency

How to convey all of the current thread's bindings to another thread? To be specific, I need the following snippet to print 2 (not 1) to stdout:
(defvar *foo* 1)
(let ((*foo* 2))
(bordeaux-threads:make-thread (lambda () (print *foo*)))) ;; prints 1
Of course I could copy *foo*'s value by hand, like this:
(let ((*foo* 2))
(bordeaux-threads:make-thread
(let ((foo-binding *foo*))
(lambda ()
(let ((*foo* foo-binding))
(print *foo*)))))) ;; prints 2
but is there anything that will allow to copy all of them at once?

The API is explicit regarding variable sharing:
The interaction between threads and dynamic variables is in some
cases complex, and depends on whether the variable has only a global
binding (as established by e.g. DEFVAR/DEFPARAMETER/top-level SETQ) or
has been bound locally (e.g. with LET or LET*) in the calling thread.
1.
Global bindings are shared between threads: the initial value of a global variable in the new thread will be the same as in the parent,
and an assignment to such a variable in any thread will be visible to
all threads in which the global binding is visible.
2.
Local bindings are local to the thread they are introduced in,
except that
3.
Local bindings in the the caller of MAKE-THREAD may or may not be shared with the new thread that it creates: this is
implementation-defined. Portable code should not depend on particular
behaviour in this case, nor should it assign to such variables without
first rebinding them in the new thread.
So make the the binding global and not local seems to be the easiest (not implementation dependent) route.
#coredump also suggests to checkout the *default-special-bindings* list for a possible sharing methodology:
This variable holds an alist associating special variable symbols with
forms to evaluate for binding values. Special variables named in this
list will be locally bound in the new thread before it begins
executing user code.
This variable may be rebound around calls to MAKE-THREAD to add/alter
default bindings. The effect of mutating this list is undefined, but
earlier forms take precedence over later forms for the same symbol, so
defaults may be overridden by consing to the head of the list.
Forms are evaluated in the new thread or in the calling thread?
Standard contents of this list: print/reader control, etc. Can borrow
the Franz equivalent?

Related

will an atom defined locally visible to other threads?

I have to following code for automation, the function accepts a unique number, and kick off a firefox. I could kick off multiple threads, each thread with a unique x passing to the function, so the function will be executed concurrently. Then will the local atom current-page be visible to other threads? if visible, then the reset! could set the atom an expected value from another thread
(defn consumer-scanning-pages [x]
(while true
(let [driver (get-firefox x)
current-page (atom 0)]
....
(reset! current-page ..)
)))
The atom will be visible to those threads you explicitly pass it to, to any further threads that those threads pass it to etc. It is no different in this respect to any other value that you may or may not pass around.
"Passing the atom to a thread" can be as simple as referring to an in-scope local it is stored in within the body of a Clojure thread-launching form:
(let [a (atom :foo)]
;; dereferencing the future object representing an off-thread computation
#(future
;; dereferencing the atom on another thread
#a))
;;= :foo
Merely creating an atom doesn't make it available to code that it is not explicitly made available to, and this is also true of code that happens to run on the thread that originally created the atom. (Consider a function that creates an atom, but never stores it in any externally visible data structures and ultimately returns an unrelated value. The atom it creates will become eligible for GC when the function returns at the latest; it will not be visible to any other code, on the same or any other thread.) Again, this is also the case with all other values.
It will not. You are creating a new atom each time that you call the function.
If you want a shared atom, just pass the atom as a param to consumer-scanning-pages

What is Clojure volatile?

There has been an addition in the recent Clojure 1.7 release : volatile!
volatile is already used in many languages, including java, but what are the semantics in Clojure?
What does it do? When is it useful?
The new volatile is as close as a real "variable" (as it is from many other programming languages) as it gets for clojure.
From the announcement:
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation.
For instance, you can set/get and update them just like you would do with a variable in C.
The only addition (and hence the name) is the volatile keyword to the actual java object.
This is to prevent the JVM from optimization and makes sure that it reads the memory location every time it is accessed.
From the JIRA ticket:
Clojure needs a faster variant of Atom for managing state inside transducers. That is, Atoms do the job, but they provide a little too much capability for the purposes of transducers. Specifically the compare and swap semantics of Atoms add too much overhead. Therefore, it was determined that a simple volatile ref type would work to ensure basic propagation of its value to other threads and reads of the latest write from any other thread. While updates are subject to race conditions, access is controlled by JVM guarantees.
Solution overview: Create a concrete type in Java, akin to clojure.lang.Box, but volatile inside supports IDeref, but not watches etc.
This mean, a volatile! can still be accessed by multiple threads (which is necessary for transducers) but it does not allow to be changed by these threads at the same time since it gives you no atomic updates.
The semantics of what volatile does is very well explained in a java answer:
there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.
Now let's see why not use var-set or transients:
Volatile vs var-set
Rich Hickey didn't want to give truly mutable variables:
Without mutable locals, people are forced to use recur, a functional
looping construct. While this may seem strange at first, it is just as
succinct as loops with mutation, and the resulting patterns can be
reused elsewhere in Clojure, i.e. recur, reduce, alter, commute etc
are all (logically) very similar.
[...]
In any case, Vars
are available for use when appropriate.
And thus creating with-local-vars, var-set etc..
The problem with these is that they're true vars and the doc string of var-set tells you:
The var must be thread-locally bound.
This is, of course, not an option for core.async which potentially executes on different threads. They're also much slower because they do all those checks.
Why not use transients
Transients are similar in that they don't allow concurrent access and optimize mutating a data structure.
The problem is that transient only work with collection that implement IEditableCollection. That is they're simply to avoid expensive intermediate representation of the collection data structures. Also remember that transients are not bashed into place and you still need some memory location to store the actual transient.
Volatiles are often used to simply hold a flag or the value of the last element (see partition-by for instance)
Summary:
Volatile's are nothing else but a wrapper around java's volatile and have thus the exact same semantics.
Don't ever share them. Use them only very carefully.
Volatiles are a "faster atom" with no atomicity guarantees. They were introduced as atoms were considered too slow to hold state in transducers.
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation

Queueing Method Calls So That They Are Performed By A Single Thread In Clojure

I'm building a wrapper around OrientDB in Clojure. One of the biggest limitations (IMHO) of OrientDB is that the ODatabaseDocumentTx is not thread-safe, and yet the lifetime of this thing from .open() to .close() is supposed to represent a single transaction, effectively forcing transactions to occur is a single thread. Indeed, thread-local refs to these hybrid database/transaction objects are provided by default. But what if I want to log in the same thread as I want to persist "real" state? If I hit an error, the log entries get rolled back too! That use case alone puts me off of virtually all DBMS's since most do not allow named transaction scope management. /soapbox
Anyways, OrientDB is the way it is, and it's not going to change for me. I'm using Clojure and I want an elegant way to construct a with-tx macro such that all imperative database calls within the with-tx body are serialized.
Obviously, I can brute-force it by creating a sentinel at the top level of the with-tx generated body and deconstructing every form to the lowest level and wrapping them in a synchronized block. That's terrible, and I'm not sure how that would interact with something like pmap.
I can search the macro body for calls to the ODatabaseDocumentTx object and wrap those in synchronized blocks.
I can create some sort of dispatching system with an agent, I guess.
Or I can subclass ODatabaseDocumentTx with synchronized method calls.
I'm scratching my head trying to come up with other approaches. Thoughts? In general the agent approach seems more appealing simply because if a block of code has database method calls interspersed, I would rather do all the computation up front, queue the calls, and just fire a whole bunch of stuff to the DB at the end. That assumes, however, that the computation doesn't need to ensure consistency of reads. IDK.
Sounds like a job for Lamina.
One option would be to use Executor with 1 thread in thread pool. Something like shown below. You can create a nice macro around this concept.
(import 'java.util.concurrent.Executors)
(import 'java.util.concurrent.Callable)
(defmacro sync [executor & body]
`(.get (.submit ~executor (proxy [Callable] []
(call []
(do ~#body))))))
(let [exe (Executors/newFixedThreadPool (int 1))
dbtx (sync exe (DatabaseTx.))]
(do
(sync exe (readfrom dbtx))
(sync exe (writeto dbtx))))
The sync macro make sure that the body expression is executed in the executor (which has only one thread) and it waits for the operation to complete so that all operations execute one by one.

Is using clojure's stm as a global state considered a good practice?

in most of my clojure programs... and alot other clojure programs I see, there is some sort of global variable in an atom:
(def *program-state*
(atom {:name "Program"
:var1 1
:var2 "Another value"}))
And this state would be referred to occasionally in the code.
(defn program-name []
(:name #*program-state*))
Reading this article http://misko.hevery.com/2008/07/24/how-to-write-3v1l-untestable-code/ made me rethink global state but somehow, even though I completely agree with the article, I think its okay to use hash-maps in atoms because its providing a common interface for manipulating global state data (analogous to using different databases to store your data).
I would like some other thoughts on this matter.
This kind of thing can be OK, but it is also often a design smell so I would approach with caution.
Things to think about:
Consistency - can one part of the code change the program name? if so then the program-name function will behave inconsistently from the perspective of other threads. Not good!
Testability - is this easy to test? can one part of the test suite that is changing the program name safely run concurrently with another test that is reading the name?
Multiple instances - will you ever have two different parts of the application expecting to use a different program-name at the same time? If so, this is a strong hint that your mutable state should not be global.
Alternatives to consider:
Using a ref instead of an atom, you can at least ensure consistency of mutable state within transactions
Using binding you can limit mutability to a per-thread basis. This solves most of the concurrency issues and can be helpful when your global variables are being used like a set of thread-local configuration parameters.
Using immutable global state wherever you can. Does it really need to be mutable?
I think having a single global state that is occasionally updated in commutative ways is fine. When you start having two global states that need to be updated and threads start using them for communication, then I start to worry.
maintaining a count of current global users is fine:
Any thread can inc or dec this at any time without hurting another
If it changes out from under your thread nothing explodes.
maintaining the log directory is questionable:
When it changes will all threads stop writing to the old one?
if two threads change it will they converge.
Using this as a message queue is even more dubious:
I think it is fine to have such a global state (and in many cases it is required) but I would be careful about that the core logic of my application have functions that take the state as a parameter and return the updated state rather than directly accessing the global state. Basically I would prefer to have a controlled access to the global state from few set of function and everything else in my program should use these set of methods to access the state as that would allow me to abstract away the state implementation i.e initially I could start with an in memory atom, then may be move to some persistent storage.

pthread_key_t and pthread_once_t?

Starting with pthreads, I cannot understand what is the business with pthread_key_t and pthread_once_t?
Would someone explain in simple terms with examples, if possible?
thanks
pthread_key_t is for creating thread thread-local storage: each thread gets its own copy of a data variable, instead of all threads sharing a global (or function-static, class-static) variable. The TLS is indexed by a key. See pthread_getspecific et al for more details.
pthread_once_t is a control for executing a function only once with pthread_once. Suppose you have to call an initialization routine, but you must only call that routine once. Furthermore, the point at which you must call it is after you've already started up multiple threads. One way to do this would be to use pthread_once(), which guarantees that your routine will only be called once, no matter how many threads try to call it at once, so long as you use the same control variable in each call. It's often easier to use pthread_once() than it is to use other alternatives.
No, it can't be explained in layman terms. Laymen cannot successfully program with pthreads in C++. It takes a specialist known as a "computer programmer" :-)
pthread_once_t is a little bit of storage which pthread_once must access in order to ensure that it does what it says on the tin. Each once control will allow an init routine to be called once, and once only, no matter how many times it is called from how many threads, possibly concurrently. Normally you use a different once control for each object you're planning to initialise on demand in a thread-safe way. You can think of it in effect as an integer which is accessed atomically as a flag whether a thread has been selected to do the init. But since pthread_once is blocking, I guess there's allowed to be a bit more to it than that if the implementation can cram in a synchronisation primitive too (the only time I ever implemented pthread_once, I couldn't, so the once control took any of 3 states (start, initialising, finished). But then I couldn't change the kernel. Unusual situation).
pthread_key_t is like an index for accessing thread-local storage. You can think of each thread as having a map from keys to values. When you add a new entry to TLS, pthread_key_create chooses a key for it and writes that key into the location you specify. You then use that key from any thread, whenever you want to set or retrieve the value of that TLS item for the current thread. The reason TLS gives you a key instead of letting you choose one, is so that unrelated libraries can use TLS, without having to co-operate to avoid both using the same value and trashing each others' TLS data. The pthread library might for example keep a global counter, and assign key 0 for the first time pthread_key_create is called, 1 for the second, and so on.
Wow, the other answers here are way too verbose.
pthread_once_t stores state for pthread_once(). Calling pthread_once(&s, fn) calls fn and sets the value pointed to by s to record the fact it has been executed. All subsequent calls to pthread_once() are noops. The name should become obvious now.
pthread_once_t should be initialized to PTHREAD_ONCE_INIT.