Assuming that the ref in the following code is modified in other transactions as well as the one below,
my concern is that this transaction will run until it's time to commit, fail on commit, then re-run the transaction.
(defn modify-ref [my-ref]
(dosync (if (some-prop-of-ref-true #my-ref)
(alter my-ref long-running-calculation))))
Here's my fear in full:
modify-ref is called, a transaction is started (call it A), and long-running-calculation starts
another transaction (call it B) starts, modifies my-ref, and returns (commits successfully)
long-running-calculation continues until it is finished
transaction A tries to commit but fails because my-ref has been modified
the transaction is restarted (call it A') with the new value of my-ref and exits because some-prop is not true
Here's what I would like to happen, and perhaps this is what happens (I just don't know, so I'm asking the question :-)
When the transaction B commits my-ref, I'd like transaction A to immediately stop (because the value of my-ref has changed) and restart with the new value. Is that what happens?
The reason I want this behavior is so that long-running-calculation doesn't waste all that CPU time on a calculation that is now obsolete.
I thought about using ensure, but I'm not sure how to use it in this context or if it is necessary.
It works as you fear.
Stopping a thread in the JVM doing whatever it is doing requires a collaborative effort so there is no generic way for Clojure (or any other JVM language) to stop a running computation. The computation must periodically check a signal to see if it should stop itself. See How do you kill a thread in Java?.
About how to implement it, I would say that is just too hard, so I would measure first if it is really really an issue. If it is, I would see if a traditional pessimistic lock is a better solution. If pessimistic locks is still not the solution, I would try to build something that runs the computation outside the transactions, use watchers on the refs and sets the refs conditionally after the computation if they have still the same value. Of course this runs outside the transactions boundaries and probably it is a lot more tricky that it sounds.
About ensure, only refs that are being modified participate in the transaction, so you can suffer for write skew. See Clojure STM ambiguity factor for a longer explanation.
This doesn't happen, because...well, how could it? Your function long-running-calculation doesn't have any code in it to handle stopping prematurely, and that's the code that's being run at the time you want to cancel the transaction. So the only way to stop it would be to preemptively stop the thread from executing and forcibly restart it at some other location. This is terribly dangerous, as java.lang.Tread/stop discovered back in Java 1.1; the side effects could be a lot worse than some wasted CPU cycles.
refs do attempt to solve this problem, sorta: if there's one long-running transaction that has to restart itself many times because shorter transactions keep sneaking in, it will take a stronger lock and run to completion. But this is a pretty rare occurrence (heck, even needing to use refs is rare, and this is a rare way for refs to behave).
Related
When using sqlite library for C++, a call to sqlite3_step on a sqlite3_stmt can fail in several different ways. As far as my understanding goes, all of them but one should result in an exception or are caused by programming errors.
One, however, puzzles me a bit: SQLITE_BUSY. I mean: SQLITE_BUSY just means that the database is locked because some transaction is already taking place on it, right? Say that some different process has begun a transaction and has not committed or released it yet, any call to a sqlite3_step trying to edit the database or to begin a new transaction will fail with SQLITE_BUSY.
However, this doesn't look like an error to me: when a resource is being used, I can... just wait! Not unlike a mutex (maybe of a more complicate nature) I guess, it should be possible to block the execution of the current thread / process until the access to the shared resource is available again.
Is there a way to do such a thing? What I would like is something in the form:
sqlite3_step_or_block_until_available_then_step(my_stmt);
So that the execution is blocked until the database is no longer busy and I can use it. Is there a solution provided by sqlite library itself? If not, what could I do? I mean, I can obviously busy wait until the SQLITE_BUSY error is no longer returned, but it doesn't really make sense as it consumes a lot of resources. Are there more performing ways to achieve this?
If possible, please provide some short example so that I can understand how to get this to work!
You could lock the database yourself with the use of mutexes, but sqlite3 does provide a mechanism that might help (emphasis mine):
SQLITE_BUSY
The SQLITE_BUSY result code indicates that the database file could not be written (or in some cases read) because of concurrent activity by some other database connection, usually a database connection in a separate process.
For example, if process A is in the middle of a large write transaction and at the same time process B attempts to start a new write transaction, process B will get back an SQLITE_BUSY result because SQLite only supports one writer at a time. Process B will need to wait for process A to finish its transaction before starting a new transaction. The sqlite3_busy_timeout() and sqlite3_busy_handler() interfaces and the busy_timeout pragma are available to process B to help it deal with SQLITE_BUSY errors.
You can register a callback to handle SQLITE_BUSY so instead of the database returning SQLITE_BUSY it will invoke your handler. There are limitations to this you can read about in the link.
I won't provide an example for this because I feel it is less likely to solve your problem of having multiple threads access a database since you can only have one busy handler per database connection.
If, like I suspect from your previous question, you have control over all the code accessing the database, then it might just prove easier to use std::mutex or std::conditional_variable:
{
std::lock_guard<std::mutex> lock(dbMutex);
sqlite3_stmt(...);
}
I use sqlite3 in multiple threads application (it is compiled with SQLITE_THREADSAFE=2). In watch window I see that sqlite->busyTimeout == 600000, i. e. that it is supposed to have 10 minutes timeout. However, sqlite3_step returns SQLITE_BUSY obvoiusly faster than after 10 minutes (it returns instantly actually, like if I never called sqlite3_busy_timeout).
What is the reason that sqlite3 ignores timeout and return error instantly?
One possibility: SQLite ignores the timeout when it detects a deadlock.
The scenario is as follows. Transaction A starts as a reader, and later attempts to perform a write. Transaction B is a writer (either started that way, or started as a reader and promoted to a writer first). B holds a RESERVED lock, waiting for readers to clear so it can start writing. A holds a SHARED lock (it's a reader) and tries to acquire RESERVED lock (so it can start writing). For description of various lock types, see http://sqlite.org/lockingv3.html
The only way to make progress in this situation is for one of the transactions to roll back. No amount of waiting will help, so when SQLite detects this situation, it doesn't honor the busy timeout.
There are two ways to avoid the possibility of a deadlock:
Switch to WAL mode - it allows one writer to co-exist with multiple readers.
Use BEGIN IMMEDIATE to start a transaction that may eventually need to write - this way, it starts as a writer right away. This of course reduces the potential concurrency in the system, as the price of avoiding deadlocks.
I made a lot of tests and share them here for other people who uses SQLite in multithreaded environment. SQLite threading support is not well documented, there are not any good tutorial that describes all threading issues in one place. I made a test program that creates 100 threads and sends random queries (INSERT, SELECT, UPDATE, DELETE) concurrently to single database. My answer is based on this program observation.
The only really thread safe journal mode is WAL. It allows multiple connections to do anything they need for the same database within one proces in the same manner as single threaded application does. Any other modes are not thread safe independently from timeouts, busy handlers and SQLITE_THREADSAFE preprocessor definition. They generate SQLITE_BUSY periodically, and it looks like complex programming task to expect such error always and handle it always. If you need thread safe SQLite that never returns SQLITE_BUSY like signle thread does, you have to set WAL journal mode.
Additionally, you have to set SQLITE_THREADSAFE=2 or SQLITE_THREADSAFE=1 preprocessor definition.
When done, you have to choose from 2 options:
You can call sqlite3_busy_timeout. It is enough, you are not required to call sqlite3_busy_handler, even from documentation it is not obvious. It gives you "default", "built-in" timeout functionality.
You can call sqlite3_busy_handler and implement timeout yourself. I don't see why, but may be under some nonstandard OS it is required. When you call sqlite3_busy_handler, it resets timeout to 0 (i. e. disabled). For desktop Linux & Windows you don't need it unless you like to write more complex code.
Given: a complex structure of various nested collections, with refs scattered in different levels.
Need: A way to take a snapshot of such a structure, while allowing writes to continue to happen in other threads.
So one the "reader" thread needs to read whole complex state in a single long transaction. The "writer" thread meanwhile makes modifications in multiple short transactions. As far as I understand, in such a case STM engine utilizes the refs history.
Here we have some interesting results. E.g., reader reaches some ref in 10 secs after beginning of transaction. Writer modifies this ref each 1 sec. It results in 10 values of ref's history. If it exceeds the ref's :max-history limit, the reader transaction will be run forever. If it exceeds :min-history, transaction may be rerun several times.
But really the reader needs just a single value of ref (the 1st one) and the writer needs just the recent one. All intermediate values in history list are useless. Is there a way to avoid such history overuse?
Thanks.
To me it's a bit of a "design smell" to have a large structure with lots of nested refs. You are effectively emulating a mutable object graph, which is a bad idea if you believe Rich Hickey's take on concurrency.
Some various thoughts to try out:
The idiomatic way to solve this problem in Clojure would be to put the state in a single top-level ref, with everything inside it being immutable. Then the reader can take a snapshot of the entire concurrent state for free (without even needing a transaction). Might be difficult to refactor to this from where you currently are, but I'd say it is best practice.
If you only want the reader to get a snaphot of the top level ref, you can just deref it directly outside of a transaction. Just be aware that the refs inside may continue to get mutated, so whether this is useful or not depends on the consistency requirements you have for the reader.
You can do everything within a (dosync...) transaction as normal for both readers and writer. You may get contention and transaction retries, but it may not be an issue.
You can create a "snapshot" function that quickly traverses the graph and dereferences all the refs within a transaction, returning the result with the refs stripped out (or replaced by new cloned refs). The reader calls snapshot once, then continues to do the rest of it's work after the snapshot is completed.
You could take a snapshot immediately each time after the writer finishes, and store it separately in an atom. Readers can use this directly (i.e. only the writer thread accesses the live data graph directly)
The general answer to your question is that you need two things:
A flag to indicate that the system is in "snapshot write" mode
A queue to hold all transactions that occur while the system is in snapshot mode
As far as what to do if the queue is overflows because the snapshot process isn't fast enough, well, there isn't much you can do about that except either optimize that process, or increase the size of your queue - it's going to be a balance that you'll have to strike depending on the needs of you app. It's a delicate balance, and is going to take some pretty extensive testing, depending on how complex your system is.
But you're on the right track. If you basically put the system in "snapshot write mode", then your reader/writer methods should automatically change where they are reading/writing from, so that the thread that is making changes gets all the "current values" and the thread reading the snapshot state is reading all the "snapshot values". You can split these up into separate methods - the snapshot reader will use the "snapshot value" methods, and all other threads will read the "current value" methods.
When the snapshot reader is done with its work, it needs to clear the snapshot state.
If a thread tries to read the "snapshot values" when no "snapshot state" is currently set, they should simply respond with the "current values" instead. No biggie.
Systems that allow snapshots of file systems to be taken for backup purposes, while not preventing new data from being written, follow a similar scheme.
Finally, unless you need to keep a record of all changes to the system (i.e. for an audit trail), then the queue of transactions actually doesn't need to be a queue of changes to be applied - it just needs to store the latest value of whatever thing you're changing in the system. When the "snapshot state" is cleared, you simply write all those non-committed values to the system, and call it done. The thing you might want to consider is making a log of those changes yet to be made, in case you need to recover from a crash, and have those changes still applied. The log file will give you a record of what happened, and can let you do this recovery. That's an oversimplification of the recovery process, but that's not really what your question is about, so I'll stop there.
What you are after is the state-of-the-art in high-performance concurrency. You should look at the work of Nathan Bronson, and his lab's collaborations with Aleksandar Prokopec, Phil Bagwell and the Scala team.
Binary Tree:
http://ppl.stanford.edu/papers/ppopp207-bronson.pdf
https://github.com/nbronson/snaptree/
Tree-of-arrays -based Hash Map
http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf
However, a quick look at the implementations above should convince you this is not "roll-your-own" territory. I'd try to adapt an off-the-shelf concurrent data structure to your needs if possible. Everything I've linked to is freely available on the JVM, but its not native Clojure as such.
I am working through the Programming Clojure book. While explaining alter and the STM, they say that if, during an alter, Clojure detects a change to the ref from outside the transaction, it will re-run the transaction with the new value. If that is the case, I would imagine the update function you pass in needs to be pure, but that isn't indicated in the docs (and it is in other similar situations).
So is my assumption correct? If not, how does the STM re-apply the function? If it is correct, is it the case that you can't rely on the docs to tell you when you can have side effects, and when you can't?
It doesn't strictly have to be pure, it just has to be idempotent. In practice this is basically the same thing.
Further, it only has to be idempotent when seen outside of the STM: if the only side effect you produce is writing to some other ref or (I think) sending to an agent, that operation will be held until your transaction has succeeded.
It's also not really the case that it has to be any of these things: just that, if your update function isn't pure, the results may not be what you expect.
Edit: dosync's docs tell you that any expressions in the body may be executed more than once. You can't run an alter without running a dosync, so it looks like all the docs you need are there. What would you like changed?
Just as a side note:
If you need to perform side-effects like logging in your STM transation you can send messages to agents to do the non-idempotent parts. Messages sent to agents are dispatched only when the transaction finishes and are guaranteed to only be sent once.
The point in Clojure is that there is no side effect when you deal with Transactions, because they acre consistent, and the function will re-run (I prefer retry) when it finds a conflict during the update of the Shared Value, otherwise it will commit succesfuly the change.
If it has to retry, it will read the updated value, so there is no side effect, the problem you could find is a Livelock, but it is kind of controlled by a limit number in retries from Clojure.
I have used a version of double checked locking in my CF app (before I knew what double checked locking was).
Essentially, I check for the existance of an object. If it is not present, I lock (usually using a named lock) and before I try and create the object I check for existance again. I thought this was a neat way to stop multiple objects being created and stop excessive locking in the system.
This seems to work, in that there is not excessive locking and object duplicates don't get created. However, I have recently learned that Double Checked Locking dosn't work in Java, what I don't know is if this holds true in CF, seeing as CF threads and locks are not quite the same as native Java threads and locks.
To add on to what Ben Doom said about Java, this is fairly standard practice in ColdFusion, specifically with an application initialization routine where you set up your application variables.
Without having at least one lock, you are letting the initial hits to your web application all initialize the application variables at the same time. This assumes that your application is busy enough to warrant this. The danger is only there if your application is busy at the time your application is first starting up.
The first lock makes sure only one request at a time initializes your variables.
The second lock, embedded within the first, will check to make sure a variable defined at the end of your initialization code exists, such as application.started. If it exists, the user is kicked out.
The double-locking pattern has saved my skin on busy sites, however, with VERY busy sites, the queue of requests for the application's initial hit to complete can climb too high, too quickly, and cause the server to crash. The idea is, the requests are waiting for the first hit, which is slow, then the second one breaks into the first cflock, and is quickly rejected. With hundreds or thousands of requests in the queue, growing every millisecond, they are all funneling down to the first cflock block. The solution is to set a very low timeout on the first cflock and not throw (or catch and duck) the lock timeout error.
As a final note, this behavior that I described has been deprecated with ColdFusion 7's onApplicationStart() method of your Application.cfc. If you are using onApplicationStart(), then you shouldn't be locking at all for your application init routine. Application.cfc is well locked already.
To conclude, yes, double-checked locking works in ColdFusion. It's helpful in a few certain circumstances, but do it right. I don't know the schematics of why it works as opposed to Java's threading model, chances are it's manually checking some sort of lookup table in the background of your ColdFusion server.
Java is threadsafe, so it isn't so much that your locks won't work as that they aren't necessary. Basically, in CF 6+, locks are needed for preventing race conditions or creating/althering objects that exist outside Java's control (files, for example).
To open a whole other can of worms...
Why don't you use a Dependency Injection library, such as ColdSpring, to keep track of your objects and prevent circular dependencies.