In Adobe ColdFusion, if
<cfset Application.obj = CreateObject("component","ComponentName")>
<cfset myResult = Application.obj.FunctionName()>
I'm concerned that a var declared in the function's local scope might have concurrency problems because Application.obj is stored in the Application scope.
<cffunction name="FunctionName">
<cfset var local = {}>
(pretend some long process happens here)
<cfif condition>
<cfset local.result = True>
<cfelse>
<cfset local.result = False>
</cfif>
<cfreturn local.result>
If two people are in that function at the same time, will the result for person 1 corrupt the result for person 2?
To avoid concurrency issues, instantiate the object in the onapplicatiomstart method of your application.cfc. That will ensure the object gets created only once. Second as long as the variable "condition" is also scoped to the local scope, the two calls should not interfere with each other.
As long as all of the variables being accessed are locally scoped (var'd in the function they're being called from, or an argument to that function), there's no concurrency issues. If you are hitting variables.somevar or this.something (or just somevar that doesn't belong to the local scope), then you might start to run into problems.
We do a whole lot of that sort of work.
Yes, there is the potential for a race condition in your code sample.
You will need to use a lock around
<cfset myResult = Application.obj.FunctionName()>
to prevent the race condition.
The type of lock to use would depend really on what the long process is doing.
If you are instantiating your framework you might consider double-checked locking. (Joe Rinehart, author of Model-Glue had a great post on this but his site isn't responding atm.)
If the long process is less critical you could use a simpler lock.
Related
There are many threads in my program, and they simply run queries like "select content from table where id= xxx".
I first planned to provide a db_query(int id) function with a static sqlite3 object and a static sqlite3_stmt object which stands for the query. So all the threads can call this function and get results.
But then I find that the sqlite3_stmt object is not stateless and cannot be used by many threads. In addition, there seems not a function provided for copying a sqlite3_stmt object, so I cannot just make a copy of the prepared statement in each function call.
Is there an elite and easy-to-implement way to solve my problem?
"There are many threads in my program," This suggests that you would do best to rethink your design. You current design is going to trip you up over and over again. However, using your current design:
You will need to create a new sqlite3_stmt for each query. Change it from static to automatic, so it will be created each time db_query(int id) is called.
You cannot run queries "concurrently". You have to do them one at a time. So you will need to protect you query code with a mutex.
Is it safe to use named locks (e.g. name of the key) to synchronize read/write access to specific keys of scopes & structs? I've been doing it for a while and never ran into concurrency issues, but I'm writing a server-level cache and I want to make sure name-based locking is safe with the server scope. I'm afraid the underlying implementation may not be thread-safe and that concurrent access on different keys could cause issues.
Locks in ColdFusion via cflock are just semaphores. They control which threads can access the code at the same time (concurrently). These locks do not impact Java's intrinsic locks or synchronized methods/statements. So cflock doesn't provide thread-safety per se.
User Ageax showed that CF structs do not use ConcurrentHashMap (see comments), so you have to explicitly use them: createObject("java", "java.util.concurrent.ConcurrentHashMap").init()
Note that ConcurrentHashMap is type and case sensitive (while the regular struct is not).
The good news
Structs in ColdFusion are thread-safe by nature. Here is an example that compares the unsafe Java HashMap and the safe ColdFusion Struct:
First run with:
<cfset s = createObject("java", "java.util.HashMap").init()>
Second run with:
<cfset s = structNew()>
<cfset s.put("A", 1)>
<cfset s.put("B", 2)>
<cfthread name="interrupter">
<cfset s.put("C", 3)>
</cfthread>
<cfoutput>
<cfloop collection="#s#" item="key">
#s[key]#,
<cfset sleep(1000)>
</cfloop>
#structKeyList(s)#
</cfoutput>
The HashMap will throw ConcurrentModificationException, because the map was accessed by the main thread while being modified by the "interrupter" thread.
The Struct however, will not throw an exception. It will simply return 1,2,A,B,C, because the iterator blocks access, i.e. causes the write operation by the "interrupter" thread to be postponed. After the iterator is done (end of loop), it releases the lock and the struct will be modified. This is why structKeyList() will immediately return the freshly written key-value-pair "C": 3 as well.
You can read more about the implementation details of concurrent map access in the official Java docs for java.util.concurrent.ConcurrentHashMap. But keep in mind that ColdFusion probably uses a derived version of ConcurrentHashMap.
I have previously asked a question regarding cf scopes on cfm pages (happy that I understand CFC scopes and potential issues), but am still not clear on the variables scope.
In the answers to my previous question, it was suggested that there are no thread safety issues using cfm pages, and you won't get the scenario where two different users access the same page and have race conditions or thread safety probs (even if I just leave my variables in the default cfm variables scope, and that the variables scope for each user will be isolated and independent (here is my last question Coldfusion Scopes Clarification)
However, I have read this blog post http://blog.alexkyprianou.com/2010/09/20/variables-scope-in-coldfusion/ regarding the use of functions on a cfm page and using the variables scope and that seems to suggest a scenario whereby the variables scope is shared between multiple users (I understand this problem in the context of CFCs - them being more akin to java classes and the variables scope being instance variables, so has thread safety issues if the CFC is shared/application scope/singleton) but this seems counter to previous answers - if a variable put in the variables scope by a function on a cfm page can be accessed by other users, then surely variables placed in variables scope directly in cfm page code is the same?
I was hoping for some clear docs and guides but have not really been able to find definitive explanations of the different scopes and where they are available.
Thanks!
Dan is correct, and the blog article being referenced in the question is simply wrong. Dan's code demonstrates it, and I have written-up and tested this thoroughly on my blog (it was too big to go here).
The bottom line is the variables scope in a CFM is safe from this sort of race condition because the variables scope for each request is different memory. So one variables.foo is not the same as the other variables.foo, so neither ever intersect.
The same applies to objects in the variables scope: their internal variables scope is a distinct entity, so any number of requests can instantiate a CFC in the request's variables scope, and the CFC instances' variables scopes are all discrete entities too.
The only time the variables scope can participate in a race condition is the variables scope of an object stored in a shared scope. Because all references to that shared-scope object will be referencing the same object in memory, so the same object's variables scope in memory.
Functions outside of a CFC accessing the variables scope won't have thread safety issues when 2 requests run the code, but if you use cfthread or other parallel features, you could still have problems with the variables scope being changed and this can cause race conditions. Often this mistake can occur with a variable you use a lot like maybe in a for loop, the "i" variable.
for(i=1;i<10;i++){t=arr[i]; }
But then another function does this while the first is running:
for(i=1;i<20;i++){t=arr[i]; }
The "i" variable needs to become a local variable to help make it thread-safe. You don't want the first loop to be able to go above 10 by mistake and this is hard to debug many times. I had to fix a ton of "i" variables and others to make my functions thread-safe everywhere when I started caching objects and using cfthread more extensively.
You can also avoid needing to lock by never changing existing objects. You can instead to the work on copies of them. This makes the data "immutable". CFML doesn't have official support for making immutable objects more efficiently, but you can make copies easily.
http://en.wikipedia.org/wiki/Immutable_object
Simple example of thread safe change to an application scope variable:
var temp=structnew();
// build complete object
temp.myValue=true;
// set complete object to application scope variable
application.myObject=temp;
Writing to any shared object is often dangerous since variables may be undefined or partially constructed. I always construct the complete object and set it to the shared variable at the end like the example above. This makes thread-safety easy if it isn't too expensive to re-create the data. The variables scope in CFC is similar to private member variables in other languages. If you modify data in shared objects, you'd might to use CFLOCK if you can't make copies instead.
Some of the confusion about coldfusion scopes is related to shared scopes in coldfusion 5 and earlier being less reliable. They had serious thread safety problems that could cause data corruption or crashes. Two threads were in certain conditions able to write to the same memory at the same time if you didn't lock correctly. Current CFML engines are able to write to struct keys without the chance of corruption / crashes. You just can't be sure which data will be actually end up as the value without some consideration of thread-safety now, but it generally won't become corrupted unless you are dealing with non-cfml object types like CFX, Java and others. A thread-safety mistake could still lead to an infinite loop which could hang the request until it times out, but it shouldn't crash unless it ran out of memory.
I think the blog is misleading. However, if you want to see for yourself, write a page with his function. Make it look something like this.
<cffunction name="test" returntype="void">
<cfscript>
foo = now();
sleep(3 * 60 * 1000); // should be 3 minutes
writedump(foo);
</cfscript>
<cffunction>
<cfdump var="#now()#">
<cfset test()>
Run the page. During the 3 minutes, open another browser or tab and run it again. Go back to where you first ran it and wait for the results. If there is no significant difference between the two outputs, then your second page request did not affect your first one.
Note that I have not tried it myself but my bet would be on the 2nd request not affecting the first one.
Question for the crowd. We are very strict on our team about scoping local variables inside functions in our CFC's. Recently though the question of scoping variables inside Application.cfc came up. Are unscoped variables in functions like onRequestStart() at the same risk for being accessed by other sessions running concurrently as we know that local variables in functions in other components are? Or are they somehow treated differently because of the nature of the functions in Application.cfc?
Your question borders on two entirely separate questions (both of which are important to clarify and address). These two questions are:
Should I scope my variables correctly when referring to them (ie. APPLICATION.settings vs. SESSION.settings).
The short answer to this is: Yes. It makes for cleaner, more readable / managable code, and prevents variable scope clashes that you may encounter later when variable names are re-used.
If you create APPLICATION.settings and SESSION.settings, but attempt to refer to them without scope (ie. <cfset myvar = settings />), you're going to have variable clash issues, as they'll be poured into VARIABLES by default--since neither APPLICATION nor SESSION are examined to resolve scope ambiguity.
The second question is:
Should I be worried about variables that are accessed in Application.cfc that could be potentially be shared by multiple users in a concurrent environment?
The short answer to this is: Yes. You should know & understand the ramifications of how your shared variables are accessed, and <CFLOCK> them where appropriate.
Unfortunately, exactly when and where you lock your shared variables is often never clarified to the CF community, so let me sum it up:
onApplicationStart() single-threads access to the APPLICATION scope. You do not need to lock APPLICATION vars that are read/written within this method.
onSessionStart() single-threads access to the SESSION scope. Same answer as before.
If you provide any kind of mechanism that accesses SESSION or APPLICATION from within the onRequestStart() method--or any other template afterwards (such as a URL reload parameter that directly calls onApplicationStart() )--all bets are off--you must now properly handle the locking of your shared variable reads and writes.
I have little impression about variables resolve order, but I can't find it in the CFML Reference or ColdFusion Dev Guide. Can anyone help?
Scope Order
The canonical scope order for ColdFusion 9 is:
Local (only inside CFCs and UDFs)
Arguments (only inside CFCs and UDFs)
Thread local (only inside threads)
Query (only inside a query loop)
Thread (only inside threads and templates that call threads)
Variables
CGI
Cffile
URL
Form
Cookie
Client
You can see Adobe's documentation on this in Developing ColdFusion 9 Applications.
However, some scopes are only available in certain contexts, so the order that scopes are searched is different, depending upon the context of the code.
Inside CFML (no threads)
Variables
CGI
Cffile
URL
Form
Cookie
Client
Inside a CFC (no threads)
Local
Arguments
Query (only inside a query loop)
Variables
CGI
Cffile
URL
Form
Cookie
Client
Best Practice
As Al Everett notes in his answer, it is considered best practice to always scope variables. Explicit scoping produces less ambiguous code and is usually faster. Anytime you don't scope a variable, you risk getting a variable from a scope that you didn't intend to.
When the variable you are accessing is in the first scope in the search order, it is actually slightly faster to leave the variable un-scoped. This is because each dot in a variable name incurs a small cost as ColdFusion resolves it. For example, in a CFC method it is slightly faster to access myVar than local.myVar. This only applies to:
local scoped variables inside a CFC or UDF
Thread local scoped variables inside a thread
variables scoped variables inside CFML
In all other circumstances it is faster (and clearer) to explicitly declare the scope.
Use of this technique should be considered bad practice. You should only use this technique in performance-critical code, where you can guarantee that the variable always exists in the intended scope. Keep in mind that it comes at the cost of increased ambiguity.
It is a generally accepted best practice to always scope your variables for two main reasons:
Performance - CF doesn't need to find the variable by searching through the scopes in turn
Accuracy - if two variables have the same name in different scopes, you may not get the one you were expecting
That said, here's the order variable scopes are searched:
Function local (VAR keyword)
Thread local (CFTHREAD)
Query results
Function ARGUMENTS
Local VARIABLES
CGI variables
FILE variables
URL parameters
FORM fields
COOKIE values
CLIENT variables
EDIT: It's also telling to note what scopes are not searched: SESSION, SERVER, APPLICATION