We have a few thousand catalogs with pages that are accessed up to a half a million times each day. At the end of every page hit, we insert some of the CGI variables into a database. If a form was submitted or a search was performed, we insert some of that information to another database. No information needs to be returned from each of these database inserts. These inserts happen at the end of the page processing.
I've read that once a "run" thread is launched page processing continues and doesn't wait for a response. This seems like it would speed up the page being complete because it's not waiting for the queries in the pages to run. Is this correct?
Is there any benefit to putting these database inserts into their own thread like this?
<cffunction
name="OnRequest"
access="public"
returntype="void"
output="true"
hint="Fires after pre page processing is complete.">
<cfargument name="RequestedContent" type="string" required="true" />
<!--- OUTPUT THE PAGE CONTENT --->
<cfinclude template="#ARGUMENTS.RequestedContent#" />
<cfscript>
thread
action="run"
name="Tracking" {
include "track1.cfm";
include "track2.cfm";
}
</cfscript>
<cfreturn />
</cffunction>
You are correct in that if you don't ever join the threads in the page then the page will finish sooner. The threads will potentially finish their execution after all content has been sent to the user and the http connection closed.
I would say this sounds like a bonafied use of that feature, but I also agree that if inserts are taking that much time you may want to look at how you are processing data.
I would say "no, there is [little] benefit in doing that". You'll save your user a few more ms, but you'd put your ColdFusion server under twice as much load, which in turn might cause a performance hit across the board. The server only has a finite number of threads available to use for all requests, so doubling the number you use for each request is gonna double the risk of using 'em all up.
Also starting a new thread has overhead in itself, so the gain you're giving your users here would not be linear.
If your insert queries are taking long enough that they are impacting your user experience, then what you should be doing is tuning those (at the DB end of things).
Also: unless you have a performance bottleneck on that code already, there's not really much point in prematurely optimising it.
There can be table locking issues when inserting data into a table on every request, so a thread can potentially ease some of the variance to the end user for that variable insert time. I have seen this used with success on high volume sites. However, as Adam mentions, the threads are finite and you could end up tying up your threads anyhow during a deadlock issue for a process that really needs a free thread.
In this scenario, you might consider queuing up inserts in the application for a minute or two and then doing the bulk insert in a thread. This obviously has some risk to data loss if the server collapses before flushing the queue and it requires a bit more work to handle thread safety. However, if you don't need the inserted data immediately, it can work well.
Related
We have a system that lets a user upload a file, we loop through that file and then make another file. The user that uploads the file is a logged in user.
Problem is that the files contain sensitive data so we have to delete them. As you can imagine there are a few places that write more info to the file and read the file. Sometimes an error happens on this page (normally something to do with CFFILE).
So my question is, is it fine to place all the code (most of it anyway) in a giant CFTRY? Then catch any exception that happens, then run another CFTRY inside the CFCATCH to delete the 2 files?(Read the update) I'm not too worried about performance as this process is not done a million times a day, maybe 3 times a month.
Is this acceptable practice for making sure files are deleted?
UPDATE I wont be deleting the files in the CFCATCH. I'll first check if the exist. Then delete them.
It's fine to use try/catch whenever it's warranted. There are no CFML police which will come and drag you away in the middle of the night if you put the try/catch around 101 lines of code instead of the permissable 100 lines of code.
However - as #Tomalak says - your wording kinda suggests that the code could stand some refactoring. You say you can't refactor the code, but adding exception handling is already refactoring, so clearly you can do it. So do it properly. Isolate bits of functionality, and put them into separate modules (I don't mean like as called by <cfmodule>, I mean the term generically), be they UDFs, methods in one or more CFCs (they're probably disparate, so probably not appropriate for a single CFC), or even just include files. They can be refactored better later on. Development is iterative and cyclical, remember: you do not need it to be perfect every time you make changes. For one thing, the definition of "perfect" changes as requierments change. But you should aim to always improve code when you maintain it. And I don't think simply slapping one try/catch around the whole thing suggests an improvement, more like "this code is out of control".
Another thing I can suggest is to make your improvements and perhaps post it to https://codereview.stackexchange.com/, and find out what others think. I dunno how many CFers inhabit that site, so it perhaps might be good to post something on Twitter marked with #ColdFusion when you've done so.
The only thing I would say about a huge try/catch block is that it stops all processing in the try block so if you have stuff that could still be done, stopping the whole train just because there is a quarter on the track may be overkill.
I have a similar process that works with a bunch of files, we put each process in a separate try / catch block so they don't interfere with each other. i.e. a broken first file doesn't screw up the next 3 perfectly good files. The catch block simply adds the error message to a string then notifies the user of a bad format (or whatever) in the file(s) that were bad but the good files processed as expected.
<!--- file one --->
<cftry>
some stuff
<cfcatch>
<cfset errors = errors & "file one did not work because #cfcatch.message#">
</cfcatch>
</cftry>
<!--- file 2 --->
<cftry>
some stuff
<cfcatch>
<cfset errors = errors & "file two did not work because #cfcatch.message#">
</cfcatch>
</cftry>
<cfetc...>
If you're looping over a dynamic set you can put the try/catch block INSIDE the loop so the try/catch doesn't stop the loop and the other stuff can process. of course this doesn't work if file 2 depends on file 1...
<cfloop index = "i" ...>
<cftry>
some stuff
<cfcatch>
<cfset errors = errors & "file #i# did not work because #cfcatch.message#">
</cfcatch>
</cftry>
</cfloop>
We have a similar situation regarding files and take a different approach.
Step 1 is to limit access to the directories containing the files.
Step 2 is scheduled cleanup. We have a ColdFusion job that runs every day. It inspects various directories and deletes any file more than x days old. The value of x depends on the directory.
This approach may or may not suit your situation.
I had to write a new section of code that almost did the same thing as what I asked. Instead of writing individual lines to the file and then wrapping a big CFTRY around that. I instead wrote each line to a variable and ended each line with a new line character, in my case (Windows) the new line characters where Chr(13) & Chr(10). But you should rather use the following line of code
<cfset NL = CreateObject("java", "java.lang.System").getProperty("line.separator")>
Which will make the variable NL equal to the current system new line character.
Then you can have a small CFTRY where you can write the entire variable to a file.
I am using cfzip to zip folders on my server, anywhere from 2mb to 5gb.
Its timing out on a folder that is 1.25gb and I get the following error:
The request has exceeded the allowable
time limit Tag: cfoutput
It errors after 11 minutes and I have the following tag at the top of the page <cfsetting requesttimeout="99999">. So technically it should be waiting 1666.65 minutes before timing out, right?
It's dedicated so I can push it to the max.
Any help with this would be very much appreciated.
Thanks :)
Zipping something that size it probably going to take a loooong time. With a file 5GB in size, I would also think you would start to get outofmemory exceptions as well.
I'd be inclined to step out of the Java process, and use cfexecute to run it at a native level using the command line (should be easy enough with whatever platform you are on).
Dropping that also into a cfthread is probably a good idea as well, and then working out some sort of alert system when it is complete sounds like a good idea.
You could try shoving the process into a thread. Those things rock out forever.
In a section from my web application i get information from http://www.geonames.org/ ( web service method ) and http://data.un.org/ ( xml files stored on our application )
I'm new at this and my questions are:
When to cache the information from geonames ?
What method to use for the cache ?
It will be ok if i cache the xml files or is the same performance ?
I use ASP.NET MVC 2 C#
Caching is a way to improve performance, consider it, only if the current performance is not acceptable, otherwise there is no need to worry.
One way you could cache your data is set up a database table with a CLOB field, a date time of when it was stored and of course fields to identify the object (such as the webservice parameters used to obtain this object).
You've to decide a policy to expire the old objects, for instance you could set up a query to run daily that would delete all objects older than a week. This is an example, I can't tell you for how long to cache, it depends on the size of the data you can keep and on how often it gets updated.
To get to your questions in more detail:
.1. When to cache the information from geonames ?
I'm not sure if I understand correctly, but normally: you'd look up the value in the cache, if it's found you return from the cache, if it's not found you do the service call and you store the result in the cache.
.2. What method to use for the cache ?
I've explained a way with SQL tables, you could also use files, but it's more complicated.
.3. It will be ok if i cache the xml files or is the same performance ?
Whatever you decide to cache, processed or unprocessed (XML) information, it won't change much from a performance point of view, since the biggest delay is fetching the information from the network, not processing it.
I don't know if its possible but just want to ask if we can cfhttp or any other thing to read selected amount of data instead of putting whole file in CFHTTP.FileContent.
I am using cfhttp and want to read only last two lines from a remote xml files(about 20 of them) and read middle two lines from some text files (about 7 of them). Is there any way I could just read that specific data instead of getting all files because its taking a lot of time right now(about 15-20 seconds). I just want to reduce the run time of my .cfm page.
Any suggestions ???
Hmm, not really any special way to get just parts of the remote files.
Do you have to do it every time? Could you fetch the files in the background, write them locally, and have your actual incoming requests just read those files? Make the reading of the remote files asynchronous to the incoming requests?
If not, and you're using CF8+, you could use CFTHREAD to thread out the various requests to run in parallel: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=Tags_t_04.html
You can use the "join" action in the end to make wait for all the threads to complete.
Edit:
Here's a great tutorial by Ben Nadel on using CFThread to parallelize CFHTTP requests:
http://www.bennadel.com/blog/749-Learning-ColdFusion-8-CFThread-Part-II-Parallel-Threads.htm
There's something else, though:
27-30 sequential http requests should not take 20-30 seconds. It really shouldn't even take 1-2 seconds - so you may have some serious other issue going on here.
HTTP does not have the ability to read a file in that manner. This has nothing to do with ColdFusion.
You can use some smart caching to reduce the time somewhat at the cost of a longer time the first time you run it using CFHTTP's method="HEAD" which does not.
Do you have a local copy of the page?
No, use CFHTTP method="GET" to grab and store it
Yes, use CFHTTP method="HEAD" to check the timestamp and compare it to the cached version. If cache is newer, use it, else CFHTTP method="GET" to grab and parse the file you want.
method="HEAD" will only grab the http headers and not the entire file which will speed things up ever so slightly. Either way, you are making almost 30 file requests, so this isn't going to be instantaneous either way you cut it.
How about ask CF to only serve that chunk of file with URL params?
Since it is XML, I guess you can use xmlSearch() and return only the result?
as for text file, u can pass in the startline & numOfLines and return only those lines as string?
I've got an app that has about 10 types of objects. There will be potentially a few thousand object instances of each type. These lists of objects need to stay synchronized between apps running on different machines. If an object is added, changed or deleted, that needs to propagate to the other machines.
This will be a star topology -- there is a central master, and the rest are clients.
I DO have the concept of a session, so can store data about each client.
Is there a good design pattern to follow for this? Even better, is there a (template based?) library that would handle asking the container what has changed since client X came by and getting that delta to send out?
Right now I'm thinking every object-type container has an update counter. When something is added/changed/removed, the update counter is incremented, and the changed object(s) are tagged with that value. Each client will save the value of the update counter when it gets an update. Later it will come back and ask for any changes since it's update counter value. Finally, deletes are kept as tombstone records (although I'm not exactly sure when to clear them out).
One thing that makes this harder is clients can come and go without the central server necessarily knowing, although I guess there could be a timeout concept (if the server haven't heard from a client in 5 minutes, it assumes the client is gone)
Is this a well-known pattern? Any additional suggestions?
How you implement synchronization very much depends on your needs. Do the changes need to be sent to the clients, or is it sufficient that the clients checks if an object is up to date whenever it uses the objects? How bout using the Proxy pattern? This pattern allows you to create a proxy-implementation of your objects that can check if they are up to date or not, do update if they are not, and then return the result. I would do this by having a lastChanged timestamp on the objects on the master and a lastUpdated timestamp on the client objects. If latency is an issue checking if an object is up-to-date on each call is probably not a good idea. Consider having a separate thread that queries the master for changed objects and marks them "dirty". This could dramatically reduce the network traffic as well.
You could also look into the Observer pattern and Publish/Subscribe.
An option that might be simple to implement and still pretty efficient is to treat the pile of objects as an opaque blob and use librsync to synchronize them. It sounds like all of the updates flow one direction, from master to clients, and there's probably some persistent representation of the objects on the clients -- a file or something. I'm assuming it's a file for the rest of this answer, though any sequence of bytes can be used.
The way it would work is that each client would generate a librsync "signature" of its local copy of the blob and send that signature to the master. The signature is about 1% of the size of the blob. The master would then use librsync to compute a delta between that signature and the current data, and send the delta to the client, which would use librsync to apply the delta to its local copy of the blob.
The librsync API is simple, and the signature/delta data transfer is relatively efficient.
If that's not workable, it may still be useful to take a more manual "delta-based" approach, to avoid having to do per-object versioning. Each time the master makes a change, it should log that change to a journal, recording what was done and to which object. Versioning is done at the whole-database level, so in effect a version number is assigned to each journal entry.
When a client connects, it should send its version of the whole object collection, and the server can then respond with the contents of the journal between the client's version and the newest entry. If updates on a given object are done by completely replacing the object contents, then you can optimize this by filtering out all but the most recent version of each object. If the master also keeps track of which versions it has sent to which client, it can know when it is safe to discard old journal entries. Even if it doesn't track that, you can still discard old journal entries according to some heuristic (probably just age) and if you receive a connection from a client whose last version is older than your oldest journal entry, then you just have to send the entire set of objects to that client.