Clojure dosync inside future vs future inside dosync - clojure

I have the following piece of code
(def number (ref 0))
(dosync (future (alter number inc))) ; A
(future (dosync (alter number inc))) ; B
The 2nd one succeeds, but the first one fails with no transaction is running. But it is wrapped inside a dosync right?
Does clojure remember opening of transactions based on which thread it was created in ?

You are correct. The whole purpose of dosync is to begin a transaction in the current thread. The future runs its code in a new thread, so the alter in case A is not inside of a dosync for its thread.
For case B, the alter and dosync are both in the same (new) thread, so there is no problem.

There are multiple reasons this doesn't work. As Alan Thompson writes, transactions are homed to a single thread, and so when you create a new thread you lose your transaction.
Another problem is the dynamic scope of dosync. The same problem would arise if you wrote
((dosync #(alter number inc)))
Here we create a function inside of the dosync scope, and let that function be the result of the dosync. Then we call the function from outside of the dosync block, but of course the transaction is no longer running.
That's very similar to what you're doing with future: future creates a function and then executes it on a new thread, returning a handle you can use to inspect the progress of that thread. Even if cross-thread transactions were allowed, you would have a race condition here: does the dosync block close its transaction before or after the alter call in the future is executed?

Related

How to convey current thread's bindings to another thread?

How to convey all of the current thread's bindings to another thread? To be specific, I need the following snippet to print 2 (not 1) to stdout:
(defvar *foo* 1)
(let ((*foo* 2))
(bordeaux-threads:make-thread (lambda () (print *foo*)))) ;; prints 1
Of course I could copy *foo*'s value by hand, like this:
(let ((*foo* 2))
(bordeaux-threads:make-thread
(let ((foo-binding *foo*))
(lambda ()
(let ((*foo* foo-binding))
(print *foo*)))))) ;; prints 2
but is there anything that will allow to copy all of them at once?
The API is explicit regarding variable sharing:
The interaction between threads and dynamic variables is in some
cases complex, and depends on whether the variable has only a global
binding (as established by e.g. DEFVAR/DEFPARAMETER/top-level SETQ) or
has been bound locally (e.g. with LET or LET*) in the calling thread.
1.
Global bindings are shared between threads: the initial value of a global variable in the new thread will be the same as in the parent,
and an assignment to such a variable in any thread will be visible to
all threads in which the global binding is visible.
2.
Local bindings are local to the thread they are introduced in,
except that
3.
Local bindings in the the caller of MAKE-THREAD may or may not be shared with the new thread that it creates: this is
implementation-defined. Portable code should not depend on particular
behaviour in this case, nor should it assign to such variables without
first rebinding them in the new thread.
So make the the binding global and not local seems to be the easiest (not implementation dependent) route.
#coredump also suggests to checkout the *default-special-bindings* list for a possible sharing methodology:
This variable holds an alist associating special variable symbols with
forms to evaluate for binding values. Special variables named in this
list will be locally bound in the new thread before it begins
executing user code.
This variable may be rebound around calls to MAKE-THREAD to add/alter
default bindings. The effect of mutating this list is undefined, but
earlier forms take precedence over later forms for the same symbol, so
defaults may be overridden by consing to the head of the list.
Forms are evaluated in the new thread or in the calling thread?
Standard contents of this list: print/reader control, etc. Can borrow
the Franz equivalent?

Immediately kill a running future thread

I'm using
(def f
(future
(while (not (Thread/interrupted))
(function-to-run))))
(Thread/sleep 100)
(future-cancel f)
to cancel my code after a specified amount of time (100ms).
The problem is, I need to cancel the already running function 'function-to-run' as well, it is important that it really stops executing that function after 100ms.
Can I somehow propagate the interrupted signal to the function?
The function is not third-party, I wrote it myself.
The basic thing to note here is: you cannot safely kill a thread without its own cooperation. Since you are the owner of the function you wish to be able to kill prematurely, it makes sense to allow the function to cooperate and die gracefully and safely.
(defn function-to-run
[]
(while work-not-done
(if-not (Thread/interrupted)
; ... do your work
(throw (InterruptedException. "Function interrupted...")))))
(def t (Thread. (fn []
(try
(while true
(function-to-run))
(catch InterruptedException e
(println (.getMessage e)))))))
To begin the thread
(.start t)
To interrupt it:
(.interrupt t)
Your approach was not sufficient for your use case because the while condition was checked only after control flow returned from function-to-run, but you wanted to stop function-to-run during its execution. The approach here is only different in that the condition is checked more frequently, namely, every time through the loop in function-to-run. Note that instead of throwing an exception from function-to-run, you could also return some value indicating an error, and as long as your loop in the main thread checks for this value, you don't have to involve exceptions at all.
If your function-to-run doesn't feature a loop where you can perform the interrupted check, then it likely is performing some blocking I/O. You may not be able to interrupt this, though many APIs will allow you to specify a timeout on the operation. In the worst case, you can still perform intermittent checks for interrupted in the function around your calls. But the bottom line still applies: you cannot safely forcibly stop execution of code running in the function; it should yield control cooperatively.
Note:
My original answer here involved presenting an example in which java's Thread.stop() was used (though strongly discouraged). Based on feedback in the comments, I revised the answer to the one above.

will an atom defined locally visible to other threads?

I have to following code for automation, the function accepts a unique number, and kick off a firefox. I could kick off multiple threads, each thread with a unique x passing to the function, so the function will be executed concurrently. Then will the local atom current-page be visible to other threads? if visible, then the reset! could set the atom an expected value from another thread
(defn consumer-scanning-pages [x]
(while true
(let [driver (get-firefox x)
current-page (atom 0)]
....
(reset! current-page ..)
)))
The atom will be visible to those threads you explicitly pass it to, to any further threads that those threads pass it to etc. It is no different in this respect to any other value that you may or may not pass around.
"Passing the atom to a thread" can be as simple as referring to an in-scope local it is stored in within the body of a Clojure thread-launching form:
(let [a (atom :foo)]
;; dereferencing the future object representing an off-thread computation
#(future
;; dereferencing the atom on another thread
#a))
;;= :foo
Merely creating an atom doesn't make it available to code that it is not explicitly made available to, and this is also true of code that happens to run on the thread that originally created the atom. (Consider a function that creates an atom, but never stores it in any externally visible data structures and ultimately returns an unrelated value. The atom it creates will become eligible for GC when the function returns at the latest; it will not be visible to any other code, on the same or any other thread.) Again, this is also the case with all other values.
It will not. You are creating a new atom each time that you call the function.
If you want a shared atom, just pass the atom as a param to consumer-scanning-pages

Queueing Method Calls So That They Are Performed By A Single Thread In Clojure

I'm building a wrapper around OrientDB in Clojure. One of the biggest limitations (IMHO) of OrientDB is that the ODatabaseDocumentTx is not thread-safe, and yet the lifetime of this thing from .open() to .close() is supposed to represent a single transaction, effectively forcing transactions to occur is a single thread. Indeed, thread-local refs to these hybrid database/transaction objects are provided by default. But what if I want to log in the same thread as I want to persist "real" state? If I hit an error, the log entries get rolled back too! That use case alone puts me off of virtually all DBMS's since most do not allow named transaction scope management. /soapbox
Anyways, OrientDB is the way it is, and it's not going to change for me. I'm using Clojure and I want an elegant way to construct a with-tx macro such that all imperative database calls within the with-tx body are serialized.
Obviously, I can brute-force it by creating a sentinel at the top level of the with-tx generated body and deconstructing every form to the lowest level and wrapping them in a synchronized block. That's terrible, and I'm not sure how that would interact with something like pmap.
I can search the macro body for calls to the ODatabaseDocumentTx object and wrap those in synchronized blocks.
I can create some sort of dispatching system with an agent, I guess.
Or I can subclass ODatabaseDocumentTx with synchronized method calls.
I'm scratching my head trying to come up with other approaches. Thoughts? In general the agent approach seems more appealing simply because if a block of code has database method calls interspersed, I would rather do all the computation up front, queue the calls, and just fire a whole bunch of stuff to the DB at the end. That assumes, however, that the computation doesn't need to ensure consistency of reads. IDK.
Sounds like a job for Lamina.
One option would be to use Executor with 1 thread in thread pool. Something like shown below. You can create a nice macro around this concept.
(import 'java.util.concurrent.Executors)
(import 'java.util.concurrent.Callable)
(defmacro sync [executor & body]
`(.get (.submit ~executor (proxy [Callable] []
(call []
(do ~#body))))))
(let [exe (Executors/newFixedThreadPool (int 1))
dbtx (sync exe (DatabaseTx.))]
(do
(sync exe (readfrom dbtx))
(sync exe (writeto dbtx))))
The sync macro make sure that the body expression is executed in the executor (which has only one thread) and it waits for the operation to complete so that all operations execute one by one.

What is the difference between Clojure's "send" and "send-off" functions with respect to dispatching an action to an agent?

The Clojure API describes these two functions as:
(send a f & args) - Dispatch an action to an agent. Returns the agent immediately. Subsequently, in a thread from a thread pool, the state of the agent will be set to the value of: (apply action-fn state-of-agent args)
and
(send-off a f & args) - Dispatch a potentially blocking action to an agent. Returns the agent immediately. Subsequently, in a separate thread, the state of the agent will be set to the value of: (apply action-fn state-of-agent args)
The only obvious difference is send-off should be used when an action may block. Can somebody explain this difference in functionality in greater detail?
all the actions that get sent to any agent using send are run in a thread pool with a couple of more threads than the physical number of processors. this causes them to run closer to the cpu's full capacity. if you make 1000 calls using send you don't really incur much switching overhead, the calls that can't be processed immediately just wait until a processor becomes available. if they block then the thread pool can run dry.
when you use send-off, a new thread is created for each call. if you send-off 1000 functions, the ones that can't be processed immediately still wait for the next available processor, but they may incur the extra overhead of starting a thread if the send-off threadpool happens to be running low. it's ok if the threads block because each task (potentially) gets a dedicated thread.