Is it safe to use named locks (e.g. name of the key) to synchronize read/write access to specific keys of scopes & structs? I've been doing it for a while and never ran into concurrency issues, but I'm writing a server-level cache and I want to make sure name-based locking is safe with the server scope. I'm afraid the underlying implementation may not be thread-safe and that concurrent access on different keys could cause issues.
Locks in ColdFusion via cflock are just semaphores. They control which threads can access the code at the same time (concurrently). These locks do not impact Java's intrinsic locks or synchronized methods/statements. So cflock doesn't provide thread-safety per se.
User Ageax showed that CF structs do not use ConcurrentHashMap (see comments), so you have to explicitly use them: createObject("java", "java.util.concurrent.ConcurrentHashMap").init()
Note that ConcurrentHashMap is type and case sensitive (while the regular struct is not).
The good news
Structs in ColdFusion are thread-safe by nature. Here is an example that compares the unsafe Java HashMap and the safe ColdFusion Struct:
First run with:
<cfset s = createObject("java", "java.util.HashMap").init()>
Second run with:
<cfset s = structNew()>
<cfset s.put("A", 1)>
<cfset s.put("B", 2)>
<cfthread name="interrupter">
<cfset s.put("C", 3)>
</cfthread>
<cfoutput>
<cfloop collection="#s#" item="key">
#s[key]#,
<cfset sleep(1000)>
</cfloop>
#structKeyList(s)#
</cfoutput>
The HashMap will throw ConcurrentModificationException, because the map was accessed by the main thread while being modified by the "interrupter" thread.
The Struct however, will not throw an exception. It will simply return 1,2,A,B,C, because the iterator blocks access, i.e. causes the write operation by the "interrupter" thread to be postponed. After the iterator is done (end of loop), it releases the lock and the struct will be modified. This is why structKeyList() will immediately return the freshly written key-value-pair "C": 3 as well.
You can read more about the implementation details of concurrent map access in the official Java docs for java.util.concurrent.ConcurrentHashMap. But keep in mind that ColdFusion probably uses a derived version of ConcurrentHashMap.
Related
There has been an addition in the recent Clojure 1.7 release : volatile!
volatile is already used in many languages, including java, but what are the semantics in Clojure?
What does it do? When is it useful?
The new volatile is as close as a real "variable" (as it is from many other programming languages) as it gets for clojure.
From the announcement:
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation.
For instance, you can set/get and update them just like you would do with a variable in C.
The only addition (and hence the name) is the volatile keyword to the actual java object.
This is to prevent the JVM from optimization and makes sure that it reads the memory location every time it is accessed.
From the JIRA ticket:
Clojure needs a faster variant of Atom for managing state inside transducers. That is, Atoms do the job, but they provide a little too much capability for the purposes of transducers. Specifically the compare and swap semantics of Atoms add too much overhead. Therefore, it was determined that a simple volatile ref type would work to ensure basic propagation of its value to other threads and reads of the latest write from any other thread. While updates are subject to race conditions, access is controlled by JVM guarantees.
Solution overview: Create a concrete type in Java, akin to clojure.lang.Box, but volatile inside supports IDeref, but not watches etc.
This mean, a volatile! can still be accessed by multiple threads (which is necessary for transducers) but it does not allow to be changed by these threads at the same time since it gives you no atomic updates.
The semantics of what volatile does is very well explained in a java answer:
there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.
Now let's see why not use var-set or transients:
Volatile vs var-set
Rich Hickey didn't want to give truly mutable variables:
Without mutable locals, people are forced to use recur, a functional
looping construct. While this may seem strange at first, it is just as
succinct as loops with mutation, and the resulting patterns can be
reused elsewhere in Clojure, i.e. recur, reduce, alter, commute etc
are all (logically) very similar.
[...]
In any case, Vars
are available for use when appropriate.
And thus creating with-local-vars, var-set etc..
The problem with these is that they're true vars and the doc string of var-set tells you:
The var must be thread-locally bound.
This is, of course, not an option for core.async which potentially executes on different threads. They're also much slower because they do all those checks.
Why not use transients
Transients are similar in that they don't allow concurrent access and optimize mutating a data structure.
The problem is that transient only work with collection that implement IEditableCollection. That is they're simply to avoid expensive intermediate representation of the collection data structures. Also remember that transients are not bashed into place and you still need some memory location to store the actual transient.
Volatiles are often used to simply hold a flag or the value of the last element (see partition-by for instance)
Summary:
Volatile's are nothing else but a wrapper around java's volatile and have thus the exact same semantics.
Don't ever share them. Use them only very carefully.
Volatiles are a "faster atom" with no atomicity guarantees. They were introduced as atoms were considered too slow to hold state in transducers.
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation
several (2 or more) client threads need to run at a high frequency, but once every 1 minute a background service thread updates a variable used by the main threads.
whats is the best method of locking a variable -- in fact, a vector -- during the small moment of update with little impact on the client threads.
there is no need to protect the vector during 'normal' (no background thread) operation since all threads utilize the values.
boost::thread is used with a endless while loop to update the vector and sleep for 60 seconds.
This seems like a good occasion for a Reader-Writer lock. All the clients lock the vector for reading only, and the background service thread locks it for writing only once every minute.
SharedLockable concept from c++14
which is implemented in Boost Thread as boost::shared_mutex
The class boost::shared_mutex provides an implementation of a multiple-reader / single-writer mutex. It implements the SharedLockable concept.
Multiple concurrent calls to lock(), try_lock(), try_lock_for(), try_lock_until(), timed_lock(), lock_shared(), try_lock_shared_for(), try_lock_shared_until(), try_lock_shared() and timed_lock_shared() are permitted.
That said, depending on your actual platform and CPU model you could get more lucky with an atomic variable.
If it's a primitive value, just using boost::atomic_int or similar would be fine. For a vector, consider using std::shared_ptr (which has atomic support).See e.g.
Confirmation of thread safety with std::unique_ptr/std::shared_ptr
You can also do without the dynamic allocation (although, you're using vector already) by using two vectors, and switching a reference to the "actual" version atomically.
I have previously asked a question regarding cf scopes on cfm pages (happy that I understand CFC scopes and potential issues), but am still not clear on the variables scope.
In the answers to my previous question, it was suggested that there are no thread safety issues using cfm pages, and you won't get the scenario where two different users access the same page and have race conditions or thread safety probs (even if I just leave my variables in the default cfm variables scope, and that the variables scope for each user will be isolated and independent (here is my last question Coldfusion Scopes Clarification)
However, I have read this blog post http://blog.alexkyprianou.com/2010/09/20/variables-scope-in-coldfusion/ regarding the use of functions on a cfm page and using the variables scope and that seems to suggest a scenario whereby the variables scope is shared between multiple users (I understand this problem in the context of CFCs - them being more akin to java classes and the variables scope being instance variables, so has thread safety issues if the CFC is shared/application scope/singleton) but this seems counter to previous answers - if a variable put in the variables scope by a function on a cfm page can be accessed by other users, then surely variables placed in variables scope directly in cfm page code is the same?
I was hoping for some clear docs and guides but have not really been able to find definitive explanations of the different scopes and where they are available.
Thanks!
Dan is correct, and the blog article being referenced in the question is simply wrong. Dan's code demonstrates it, and I have written-up and tested this thoroughly on my blog (it was too big to go here).
The bottom line is the variables scope in a CFM is safe from this sort of race condition because the variables scope for each request is different memory. So one variables.foo is not the same as the other variables.foo, so neither ever intersect.
The same applies to objects in the variables scope: their internal variables scope is a distinct entity, so any number of requests can instantiate a CFC in the request's variables scope, and the CFC instances' variables scopes are all discrete entities too.
The only time the variables scope can participate in a race condition is the variables scope of an object stored in a shared scope. Because all references to that shared-scope object will be referencing the same object in memory, so the same object's variables scope in memory.
Functions outside of a CFC accessing the variables scope won't have thread safety issues when 2 requests run the code, but if you use cfthread or other parallel features, you could still have problems with the variables scope being changed and this can cause race conditions. Often this mistake can occur with a variable you use a lot like maybe in a for loop, the "i" variable.
for(i=1;i<10;i++){t=arr[i]; }
But then another function does this while the first is running:
for(i=1;i<20;i++){t=arr[i]; }
The "i" variable needs to become a local variable to help make it thread-safe. You don't want the first loop to be able to go above 10 by mistake and this is hard to debug many times. I had to fix a ton of "i" variables and others to make my functions thread-safe everywhere when I started caching objects and using cfthread more extensively.
You can also avoid needing to lock by never changing existing objects. You can instead to the work on copies of them. This makes the data "immutable". CFML doesn't have official support for making immutable objects more efficiently, but you can make copies easily.
http://en.wikipedia.org/wiki/Immutable_object
Simple example of thread safe change to an application scope variable:
var temp=structnew();
// build complete object
temp.myValue=true;
// set complete object to application scope variable
application.myObject=temp;
Writing to any shared object is often dangerous since variables may be undefined or partially constructed. I always construct the complete object and set it to the shared variable at the end like the example above. This makes thread-safety easy if it isn't too expensive to re-create the data. The variables scope in CFC is similar to private member variables in other languages. If you modify data in shared objects, you'd might to use CFLOCK if you can't make copies instead.
Some of the confusion about coldfusion scopes is related to shared scopes in coldfusion 5 and earlier being less reliable. They had serious thread safety problems that could cause data corruption or crashes. Two threads were in certain conditions able to write to the same memory at the same time if you didn't lock correctly. Current CFML engines are able to write to struct keys without the chance of corruption / crashes. You just can't be sure which data will be actually end up as the value without some consideration of thread-safety now, but it generally won't become corrupted unless you are dealing with non-cfml object types like CFX, Java and others. A thread-safety mistake could still lead to an infinite loop which could hang the request until it times out, but it shouldn't crash unless it ran out of memory.
I think the blog is misleading. However, if you want to see for yourself, write a page with his function. Make it look something like this.
<cffunction name="test" returntype="void">
<cfscript>
foo = now();
sleep(3 * 60 * 1000); // should be 3 minutes
writedump(foo);
</cfscript>
<cffunction>
<cfdump var="#now()#">
<cfset test()>
Run the page. During the 3 minutes, open another browser or tab and run it again. Go back to where you first ran it and wait for the results. If there is no significant difference between the two outputs, then your second page request did not affect your first one.
Note that I have not tried it myself but my bet would be on the 2nd request not affecting the first one.
In Adobe ColdFusion, if
<cfset Application.obj = CreateObject("component","ComponentName")>
<cfset myResult = Application.obj.FunctionName()>
I'm concerned that a var declared in the function's local scope might have concurrency problems because Application.obj is stored in the Application scope.
<cffunction name="FunctionName">
<cfset var local = {}>
(pretend some long process happens here)
<cfif condition>
<cfset local.result = True>
<cfelse>
<cfset local.result = False>
</cfif>
<cfreturn local.result>
If two people are in that function at the same time, will the result for person 1 corrupt the result for person 2?
To avoid concurrency issues, instantiate the object in the onapplicatiomstart method of your application.cfc. That will ensure the object gets created only once. Second as long as the variable "condition" is also scoped to the local scope, the two calls should not interfere with each other.
As long as all of the variables being accessed are locally scoped (var'd in the function they're being called from, or an argument to that function), there's no concurrency issues. If you are hitting variables.somevar or this.something (or just somevar that doesn't belong to the local scope), then you might start to run into problems.
We do a whole lot of that sort of work.
Yes, there is the potential for a race condition in your code sample.
You will need to use a lock around
<cfset myResult = Application.obj.FunctionName()>
to prevent the race condition.
The type of lock to use would depend really on what the long process is doing.
If you are instantiating your framework you might consider double-checked locking. (Joe Rinehart, author of Model-Glue had a great post on this but his site isn't responding atm.)
If the long process is less critical you could use a simpler lock.
I have a multithreaded C++ application which holds a complex data structure in memory (cached data).
Everything is great while I just read the data. I can have as many threads as I want access the data.
However the cached structure is not static.
If the requested data item is not available it will be read from database and is then inserted into the data tree. This is probably also not problematic and even if I use a mutex while I add the new data item to the tree that will only take few cycles (it's just adding a pointer).
There is a Garbage Collection process that's executed every now and then. It removes all old items from the tree. To do so I need to lock the whole thing down to make sure that no other process is currently accessing any data that's going to be removed from memory. I also have to lock the tree while I read from the cache so that I don't remove items while they are processed (kind of "the same thing the other way around").
"Pseudocode":
function getItem(key)
lockMutex()
foundItem = walkTreeToFindItem(key)
copyItem(foundItem, safeCopy)
unlockMutex()
return safeCopy
end function
function garbageCollection()
while item = nextItemInTree
if (tooOld) then
lockMutex()
deleteItem(item)
unlockMutex()
end if
end while
end function
What's bothering me: This means, that I have to lock the tree while I'm reading (to avoid the garbage collection to start while I read). However - as a side-effect - I also can't have two reading processes at the same time anymore.
Any suggestions?
Is there some kind of "this is a readonly action that only collides with writes" Mutex?
Look into read-write-lock.
You didn't specify which framework can you use but both pThread and boost have implemented that pattern.
The concept is a "shared reader, single writer" lock as others have stated. In Linux environments you should be able to use pthread_rwlock_t without any framework. I would suggest looking into boost::shared_lock as well.
I suggest a reader-writer lock. The idea is you can acquire a lock for "reading" or for "writing", and the lock will allow multiple readers, but only one writer. Very handy.
In C++17 this type of access (multiple read, single write) is supported directly with std::shared_mutex. I think this was adopted from boost::shared_lock. The topic is also described with an example in Anthony Williams's C++ Concurrency in Action.
https://livebook.manning.com/book/c-plus-plus-concurrency-in-action-second-edition/chapter-3/185.
The essential bits are:
use std::shared_lock on reads where shared access is allowed
existing shared_locks don't prevent a new shared_lock from locking
use std::unique_lock on update/write functions, where exclusive access is needed
will wait until all existing shared reads complete, as well as other writes
precludes any other reads or writes while locked