Long Running Parallel ColdFusion Script Issues

Long Running Parallel ColdFusion Script Issues - coldfusion

So here is my situation. I have a script which is hard-coded to always run two simultaneous threads (using cfthread). Basically the script takes an excel file or csv file and splits the data up into two sections and my script creates webpages in parallel.
However, it is apparent from the pages created that thread two (the threads are being spawned in a loop) completes its section of work... however thread one only gets a small portion of its work complete... the script also dies part way through. So I'm confused why this is happening? The only concern I have is that the actual function I run, createPage(...), is itself spawning a thread to do its work which is causing the server to freak out. I can't tell if this is happening for two reasons... One I do not have access to server monitoring at that level, and two, the createpage(...) function is black-boxed because it is a proprietary API function.
Any advice? Also, is there any way to have that createPage(...), if it is spawning a thread in its implementation, wait to complete before calling it again?
Here is psuedocode of how my code, in general is working:
<cfloop from=1 to=2>
<cfthread ...> //spawn threads
<cfloop from="pageWorkStart" to="pageWorkFinish">
<cfset createPage(...)> //at this point there are two threads calling this function, which itself may be spawning threads and is not waiting to finish... the server is potentially choking from queued processes backing up.
</cfloop>
</cfthread>
</cfloop>

Related

ColdFusion : terminate CFTHREAD in separate request

I am using CFTHREAD in my ColdFusion application. From what I've read from Ben Nadel (https://www.bennadel.com/blog/2980-terminating-asynchronous-cfthreads-in-coldfusion.htm) ColdFusion only exposes and tracks threads in the current request. In my situation, I am spawning a thread via an ajax call and then providing the user with a cancel button. I was hoping the cancel button could call the terminate method on the thread, but no matter where I store it (application,server,session) ColdFusion always returns an error that it was unable to terminate thread "THREAD_NAME" because "THREAD_NAME" was not spawned.
I know that under the hood, ColdFusion is mostly Java. So I'm hoping that there may be a way. Could anyone either confirm or deny this possibility? Any example of how?
Thanks!

Sorry, I don't have a 50 reputation to comment, so I'll post this as an answer. Recently, I was in the same situation with a CFThread spawned via ajax and I needed to terminate it somehow but was unable to. I had a CFQuery inside a CFLoop that used its datasource in the application scope. So what I came up with was to sign into ColdFusion Administrator and temporarily renaming the datasource which caused the thread to throw a database error. While it was inelegant termination, it served the purpose at the time.
So after seeing this question it got me thinking about a possible workaround if there isn't a known way to accomplish this. Suppose during your thread processing, it tests for the value of a variable in the application/server/session scope. Supposing the value is initially set to "true" and then subsequently set to "false" by another process, when the thread finds the false value, it can terminate gracefully.

Can you?
Yes, but only using internal classes. When the cfthread is created, use the local THREAD_NAME to retrieve a reference to the underlying thread object.
context = getPageContext().getFusionContext();
thread = context.getUserThreadTask( "theLocalTaskName" );
Since the local name can be used by multiple requests, the reference should be stored under a unique name, like a uuid. The reference is actually an instance of an internal class coldfusion.threads.Task. To terminate it, call its cancel() method.
thread.cancel();
Should you?
That's a big question and all depends on what the thread does - how it does it - and how the resources it uses would be affected if the process just stops dead, midstream, with no warning.
The reason is that calling <cfthread action="terminate"..> kills the thread - instantly. CF doesn't care if it's in the middle of a critical section. The server just whacks it with a mallet and stops it cold. The exception logs show that CF does this by invoking Thread.stop()
"Information","cfthread-47","09/07/19","17:10:44","","THREAD_V_2: Terminated"
java.lang.ThreadDeath
at java.base/java.lang.Thread.stop(Thread.java:942)
at coldfusion.thread.Task.cancel(Task.java:257)
at coldfusion.tagext.lang.ThreadTag.terminateThread(ThreadTag.java:345)
at coldfusion.tagext.lang.ThreadTag.doStartTag(ThreadTag.java:204)
The java documentation says stop() method is deprecated because it's inherently unsafe:
Stopping a thread causes it to unlock all the monitors that it has
locked. (The monitors are unlocked as the ThreadDeath exception
propagates up the stack.) If any of the objects previously protected
by these monitors were in an inconsistent state, other threads may now
view these objects in an inconsistent state. Such objects are said to
be damaged. When threads operate on damaged objects, arbitrary
behavior can result. This behavior may be subtle and difficult to
detect, or it may be pronounced. Unlike other unchecked exceptions,
ThreadDeath kills threads silently; thus, the user has no warning that
his program may be corrupted. The corruption can manifest itself at
any time after the actual damage occurs, even hours or days in the
future.
So it's important to consider what a thread actually does, and determine if it's even safe to terminate. For example, if a thread processes a file with FileOpen(), forcibly terminating it might prevent the thread from releasing the handle, leaving the underlying file in a locked state, which is undesirable.
The recommended way of stopping threads in java is with an interrupt(). That's essentially the concept user12031119 described. An interrupt doesn't forcibly kill a thread. It's just a flag that suggests a thread stop processing. Leaving it up to the thread itself to determine when it's safe to exit. That allows threads to finish critical sections or perform any cleanup tasks before terminating. Yes, it requires a little more coding, but the results are much more stable and predictable than with "terminate".

What you will want to do is setup a data structure somewhere like application or session scope that keeps track of threads running that you want to be able to cancel.
Application.cfc OnApplicationStart
<cfset application.cancelThread = {} />
Before entering thread create id and then pass into thread
<cfset threadId = createUUID() />
<cfset application.cancelThread[threadId] = false />
Pass the threadId back to the client for the cancel button. On click of the cancel button pass back the threadId
<cfset application.cancelThread[form.threadId] = true />
During thread execution
<cfif application.cancelThread[threadId]>
<cfabort />
<!--- or your chosen approach to ending the processing --->
</cfif>
If thread reached end then remove thread reference
<cfset structDelete(application.cancelThread, threadId) />

Should I <CFLOCK> a <CFFILE Action="COPY" ...>?

Looking over some old code that timeouts occasionally, I came across this CFLOCK around a CFFILE COPY run inside a CFLOOP.
Does this use of CFFLOCK make sense or seem necessary? The file is being copied from one location to a newly created folder that is itself then zipped up for a future download.
At first, I was just going to increase the timeout of the lock but then I started to stare at it and wonder if it was a mistake.
<cfloop query="LOCAL.qDocsZip">
<cflock name="copyFileLock" timeout="3600" type="readonly">
<cffile action="copy"
source="#ExpandPath(LOCAL.qDocsZip.file_location)#"
destination="#LOCAL.zip_new_path#/#LOCAL.qDocsZip.original_file_name#">
</cflock>
</cfloop>

Locking seems reasonable here, but it's not the right spot and you are not locking the file access exlusively. You should lock the whole transaction, i.e. lock right before you fetch LOCAL.qDocsZip. This way you are making sure that the files to copy are only touched by a single thread and do not run into concurrency with another thread. On that note: cflock is a JVM specific semaphore, so it cannot guarantee transaction safety on a system level, e.g. if you have other programs that access your files parallel.
Here is what it should look like:
<!--- only one thread at a time can execute the code within this lock (exclusive named lock) --->
<cflock name="copyFileLock" timeout="3600" type="exclusive">
<!--- fetch files to copy in this transaction --->
<cfquery name="LOCAL.qDocsZip" ...>
...
</cfquery>
<!--- copy all the files --->
<cfloop query="LOCAL.qDocsZip">
<cffile action="copy" ...>
</cfloop>
</cflock>
(You should probably add some error handling as well, if that's not just left out in your snippet.)
Explanation
Every thread will stop at cflock and ask the semaphore copyFileLock if it is currently "running". If not, the thread will continue, fetch the files and copy them. While this whole copying is in progress (the semaphore is "running"), every other thread that encounters the cflock will be queued, so pause the execution and wait for the semaphore (in your case, every queued thread will wait 3600 seconds for the semaphore to give the "go", or otherwise just forget about it and exit). After the copy operation has finished on the first thread, the semaphore will stop "running" and check the queue. If other threads were queued in the meantime, the next thread in queue will resume execution, rinse and repeat.
The exclusive lock will make sure, that a thread never "sees" an incomplete file state (= fetch a file that is about to be copied by another thread).

Is cfexecute timeout=0 as good as cfthread action=run if no output needed?

Looking at some legacy code and the programmer in question uses:
<cfthread action="run">
<cfexecute name="c:\myapp.exe" timeout="30">
</cfthread>
Can one safely replace the code above with this?
<cfexecute name="c:\myapp.exe" timeout="0">
Is CF going to spawn up a thread in the code above anyway? And is the thread going to be counted towards "Maximum number of threads available for CFTHREAD"?

If the intent is to have a non-blocking flow of the code, then you can safely replace the earlier code with yours.
In my understanding, CF is not creating a thread when it gets a timeout="0". It must be just calling the exe (which creates a new process on server) and never wait for the process to reply. So, nothing is added to the thread limit count.

ColdFusion performance and locking optimisation

I have a link that will allow users to click it and it fetches a zip file of photos. If the zip file doesn't exist, it then starts a thread to create the zip and displays a message back to the user that the photos are currently being processed.
What I'm trying to avoid is a user clicking the link repeatedly and setting off a whole mass of threads that will try create/update the zip file. The zip file processing is quite system resource intensive so I only want to allow the application to generate one zip at a time. If one is busy being compiled, it should just do nothing and queue the requests.
Currently how I am handling it is with a cflock around the thread:
<cflock name="createAlbumZip" type="exclusive" timeout="30" throwontimeout="no">
<cfthread action="run" albumId="#arguments.albumId#" name="#CreateUUID()#">
....
What I am hoping occurs here (it seems to be working if I test it) is that it will check if there is currently a thread running using a lock called 'createAlbumZip'. If there is, it will queue the request for 30 seconds after which it should timeout without any error. If it wasn't able to create it within the 30 seconds that is fine.
So - this seems to be working, but my question is: Is this the best way to handle a scenario like this? Is locking the right approach? Are there any shortcomings that could arise from this approach that I don't see?

There are a million ways to skin this cat. Locking is a good start, and as per your comment on #Pat Branley's answer, I think your locking outside the thread creation might be a little more efficient for just the reason you propose: the potential would exist to create dozens of threads whose whole lifespan will consist of waiting for a lock to open or timeout.
The other thing you need to do is double up on the IF statement:
<cfif not (zip file exists)>
<cflock ...>
<cfif not (zip file exists)>
<cfthread>
...create zip...
</cfthread>
</cfif>
</cflock>
</cfif>
This will prevent the case where thread B is waiting while thread A creates the zip, and then thread A finishes, and thread B proceeds to recreate/overwrite it.
In addition, you could consider something like using JavaScript to prevent extra clicks by disabling the button/link after it's been clicked.

I think you have that code around the wrong way. What you are saying is 'only one thread is allowed to spawn this new thread'. Now that may work in your case because you have the timeouts set such that nobody can create another thread so there is no chance two threads are executing at once.
what you want to say is 'only one thread is allowed to make a zip'. So I would do this
<cfthread .... >
<cflock>
...zip....

Number of parallel instances of my process (app)

Is there some portable way to check the number of parallel instances of my app?
I have a c++ app (win32) where I need to know how often it was started. The problem is
that several user can start it parallel (terminal server), so i cannot search the "running process" list because I'm not able to access the the list of other users.
I tried it with Semaphore (boost & win32 CreateSemaphore)
It worked, but now I have the problem if the app crashes (Assertion or just kill the process) the counter is not changed. (rebooting helps)
Also manually removing/resetting the semaphore counter in my code is not possible because I don't know if somebody else is running my application.

Edited to add:
Suppose you have a license that lets you run 20 full-functionality copies of your program. Then you could have 20 mutexes, named MyProgMutex1 through MyProgMutex20. At startup, your program can loop through the mutexes. If it finds a spare mutex that it can take, it stops looping and enters full-functionality mode. If it loops through all the mutexes without being able to take any of them, then it enters reduced-functionality mode.
Original answer:
I assume you want to make sure that only one copy of your process runs at once. (Or, for Terminal Server, one copy of your process per login session).
Your named semaphore solution is close. The right way to do this is a named mutex. Use CreateMutex to make the mutex, then call WaitForSingleObject with a timeout of zero. If WaitForSingleObject returns WAIT_TIMEOUT, another copy of the process is running. If it returns WAIT_OBJECT_0 or WAIT_ABANDONED, then you are the only copy of the process. You need to keep the mutex handle open while your program runs - either call CloseHandle when your process is about to exit, or just deliberately leak the handle and rely on Window's built-in cleanup to release the handle for you when your process exits. Windows will automatically increment the mutex's counter when your process exits.

The only thing I can think of that mitigates the problem of crashed processes is a kind of “dead man’s switch”: each process needs to update its status in regular intervals. If a process fails to do this, it’s automatically discarded from the list of active processes.
This technique requires that one of the processes acts as a server which keeps tab of whether other processes have updated recently. If the server dies, then another process can take over. This, in turn, requires that each process tests whether there still is a server alive.
Alternatively, each process can be its own server and keep track locally. This may be easier to implement than server-switching.

You can broadcast message and other instances of your application should send some response. You count responses - you get number of instances.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js