I am building an application that performs a master query with many joins. This query data is then available to the whole application to play around with in a global variable. The query refreshes or gets the latest result set on each page refresh; so it's only in the same state for the life of the request.
In other parts of this application, I sometimes run 100's of QoQ's on this data - usually the result of recursive function calls. However, while QoQ is a great feature, it's not too fast and sometimes page loads can be between 3000 - 5000 ms on a bad day. It's just not fast enough.
Is there any kind of optimisation techniques I can do to make QoQ perform faster or perhaps an alternative method? I read an interesting article by Ben Nadel on Duplicate() function - is there any scope for using that and if so, how?
I would love to hear your thoughts.
Don't worry about crazy suggestions, this is a personal project so I'm willing to take risks. I'm running this on Railo compatible with CF8.
Many thanks,
Michael.
Without seeing the code and complexity of the QoQs it is hard to say for sure the best approach, however one thing you can do is use a struct to index the records outside of a QoQ. Much of the overhead of using QoQ is building new query objects, and using a struct write only approach is much more efficient than for example looping over the original query and making comparisons.
For example:
<!--- build up index --->
<cfset structindex = {} />
<cfset fields = "first,last,company" />
<cfloop list="#fields#" index="field">
<cfset key = "field:#field#,value:#q[field][currentrow]#" />
<!--- initialize each key (instead of using stuctkeyexists) --->
<cfloop query="q">
<cfset structindex[key] = "" />
</cfloop>
<cfloop query="q">
<!--- update each key with list of matching row indexes --->
<cfset structindex[key] = listappend(structindex[key], currentrow) />
</cfloop>
</cfloop>
<!--- save structindex to global variable --->
<!--- output rows matching index --->
<cfset key = "field:company,value:stackexchange" />
<cfoutput>
<cfloop list="#structindex[key]#" index="row">
#q.last[row]#, #q.first[row]# (#q.company[row]#)<br />
</cfloop>
</cfoutput>
If this doesn't match your need provide some examples of the QoQ statements and how many records are in the main query.
First, I would look at the time taken by the master query. If it can be cached for some mount of time and is taking a good chunk of the pageload time, I would cache it.
Next, I would look at the recursive calls. If they can be made iterative, that would probably speed things up. I realize this is not always possible. I would be surprised if this isn't your biggest time sink. without knowing more about what you are doing, though, it's hard to help you optimize this.
I might also consider writing some of the recursive QoQs s stored procedures on the DB server, which is designed to handle data quickly and slice and dice efficiently. CF is not -- QoQs are very useful, but not speed demons (as you've noted).
Finally, I would look for straightfoward filters, and not use QoQ. Rather, I would just run a loop over the master query in a standard cfoutput tag, and filter on the fly. This means you are looping over the master query once, rather than the master query once and the result query once.
There are two primary solutions here. First you could do something in CF with the records outside of QoQ. I posted my suggestion on this already. The other is to do everything in the db. One way I've found to do this is to use a subquery as a temp table. You can even keep the sql statement in a global variable and then reference it in the same places you are currently with the QoQ but doing a real query to the database. It may sound slower than one trip tothe DB and then many QoQ but in reality it probably isn't if indexed efficiently.
select *
from (
#sqlstring#
) as tmp
where company = 'stackexchange'
I have actually done this for system with complex criteria for both what records a user should have access to and then also what they can filter for in those records. Going with this approach means you always know the source of the inner records instead of trying to ensure every single query is pulling correctly.
Edit:
It is actually safer (and usually more efficient) to use queryparams when ever possible. I found this can be done by including a file of the sql statement...
select *
from (
<cfinclude template="master_subquery.cfm" />
) as tmp
where company = 'stackexchange'
Related
I'm wondering if there is a better way to do what I'm doing. It works but I feel there should be a better way. If my query result in 20K records for example, I'm getting "The request has exceeded the allowable time limit Tag: CFQUERY"
<cfquery name="GetMyRecords" datasource="MyDSN">
SELECT idnumber,PrefAddr,...more colums
FROM um_valid
WHERE userid = <cfqueryparam cfsqltype="cf_sql_varchar"
value="#session.userid#">
AND session_id = <cfqueryparam cfsqltype="cf_sql_numeric"
value="#session.Session_Id#">
AND status NOT IN (<cfqueryparam cfsqltype="cf_sql_varchar"
value="X,C">)
</cfquery>
I also have an existing store procedure that expect some values from the query to do what it's supposed to do. So I'm looping it like this:
<cfloop query="GetMyRecords">
<cfstoredproc procedure="MyProc" datasource="MyDSN">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#id_number" value="#Trim(idnumber)#">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#Aaddr_pref_ind" value="#Trim(PrefAddr)#">
----- still more params to be passed----
</cfstoredproc>
</cfloop>
Does ColdFusion has a better technique to avoid either time out error or 500 error?
Like another poster mentioned, reducing the number of database calls should be a priority to you. I suggest joining data (if possible) in your first query rather than looping your query and querying again.
To fix the time issue, you can put requestTimeout in your page to override the default timeout. The time is in seconds.
<cfsetting requestTimeOut = "600">
See this explanation.
Your current approach is to make 1 query, which contains n records. Then loop over that record set, calling a query for each record. This results in your calling n + 1 queries per request. As the volume of data returned by your first query increases, so too does the volume of overall queries made to the database. In your case, you're trying to make 20,001 calls to the database in a single request.
Ideally, you need a solution that involves one call to a stored procedure with a properly optimized query that can return all of your data in a single record set. If you don't need to dump all the data onto the page at the same time, then create a paginated query that will return x number of records per page. Then the user can go page by page through the query or provide a search form with additional filters to allow the user to reduce the overall size of the records returned.
Need to see full queries to give you an example, but in general, this is a really useful thing in CF to help you out!
You need to look into the attribute called group. This lets you specify a column to group your output by. This actually will eliminate the need for the stored proc you are calling entirely.
The way it works is letting you create sub-set outputs based on grouping. So for example, you could do this with your id's and output that group.
You can also have multiple of these and have header and footer sections for each one for display purposes or just logic manipulation.
This lets you query the entire dataset and then manipulate it in the loop without having subqueries which is ultra inefficient and cringe worthy.
This is something quite unique to ColdFusion, check it out!
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7ff6.html
I have a table with values under a column named:str_condition
values in this column can be : variables.bit_male / application.bit_male / isdefined('session.int_user_id')
The value can be complex as it can be.
I need to use the value of the values in the column.
Currently, what I am doing is
<cfif evaluate(query.str_condition) eq true>
.....code....
</cfif>
Now, I need to omit the evaluate.
TBH, I'd stick with evaluate() for this: you're leveraging one of the few situations it makes sense. Provided what you have in the DB field is just an expression (no tags), then evaluate() will work fine.
As others have suggested... storing expressions in the DB is not ideal, but I can see how sometimes it might be the best approach, However do reconsider it though, in case you can come up with a different approach entirely (this is situation-specific, so we can't really give you guidance on that).
The only other real option for you would be to write the code from the DB to file then include it, but that would be a worse solution than just using evaluate(), I think.
A lot of people get hung up on the dogma that is evaluate() is bad, without really stopping to think about why that's the case... it's unnecessary for most situations people use it, but it's completely fine in situations in which it is needed (such as yours).
This is an edited answer, since I originally misread the question.
In many cases, array notation is your freind
<cfif queryname['fieldname'][rownumber] is true>
code for true
Note that the queryname is not quoted but the fieldname is. If you don't quote the fieldname, ColdFusion will assume it's a variable.
Also pertinent is that if you are storing things in a database, such as code, that you want to select and then execute, you have to select those things, write them to another .cfm file, and then cfinclude that file. That's somewhat inefficient.
In your case, you are storing variable names in your database. If using evaluate is giving you the correct results, anything you change would likely be a change for the worse.
How many unique combinations exist in the database? And do new values show up without developer interaction?
If it's a reasonable number of possible values that don't change then use a switch statement and write the line of code that handles each possible value.
<cfswitch expression="#query.str_condition#">
<cfcase value="variables.bit_male">
<cfset passed = variables.bit_male>
</cfcase>
<cfcase value="application.bit_male">
<cfset passed = application.bit_male>
</cfcase>
<cfcase value="isdefined('session.int_user_id')">
<cfset passed = isdefined('session.int_user_id')>
</cfcase>
<cfdefaultcase>
<cfset passed = false>
</cfdefaultcase>
</cfswitch>
<cfif passed>
.....code....
</cfif>
You don't have to hand write all of them, you can use a sql query to generate the repetitive part of the coldfusion code.
SELECT DISTINCT '<cfcase value="' + replace(table.str_condition,'"','""') + '"><cfset passed = ' + table.str_condition + '></cfcase>' as cfml
FROM table
ORDER BY len(str_condition), str_condition
If I am reading this correctly you are not just storing variable names in the database but actual snippets of code such as [ isDefined(session.it_user_id) ].
If this is what you are doing then you need to stop and rethink what you are trying to achieve. Storing code in your database and using evaluate to execute it is an incredibly bad idea.
It sounds to me like you are trying to create a generic code block that you can copy paste in multiple places and just set your conditional logic in the db.
The short answer is not to find a way around using evaluate but to stop storing code in your database full stop.
I develop using ColdFusion and wanted to know what is the best strategy to loop over large query result set. Is there any performance difference between using cfloop and cfoutput? If not, is there any reason to prefer one over the other?
I believe that there used to be. I think this difference has been tackled, the best bet is to do a test for each to test in you specific use case.
<cfset t = GetTickCount()/>
<cf... query="qry">
<!--- Do something --->
</cf...>
<cfset dt = GetTickCount() - t/>
<cfdump var="#dt#"/>
<!---
If the differences are small you can use java.lang.System.nanoTime() instead
--->
There are some notable differences though. cfoutput can do grouped loops, which cfloop cannot.
<cfoutput query="qry" group="col">
<!--- Loops once for each group --->
<cfoutput>
<!--- Loops once for each record within the group --->
</cfoutput>
</cfoutput>
For cfoutput you can specify the startrow and the maxrows (or the count) to paginate your result. For cfloop you have to specify the endrow index instead of the count.
Also you cannot use cfoutput for a query nested within an existing cfoutput tag, you will need to end the containing cfoutput first.
One good reason to use cfloop instead of cfoutput is if you need to loop a query output within another query output cfoutput does not support nested query outputting. You can however get away with it using cfloops. So:
<cfoutput query="test1">
#test1ID#
<cfoutput query="test2">
#test2ID#
</cfoutput>
</cfoutput>
does not work, but if you replace the cfoutputs with cfloops, it will.
As of CF10, with the ability to group cfloops, that's the only remaining functional difference. They both perform the same.
I believe it's all the same as performance, Ben Forta
And the rest is pretty much personal preference as far as how you "like" to work with your loop. Keep in mind you should always scope your variables, but inside a cfoutput loop that would be especially important since the query fields "could" be referenced without referring to their scope.
one reason you may prefer the cfloop approach would be if you needed to "escape" cfoutput during your loop for any reason. I have run into that several times, so I generally prefer cfloop.
There wouldn't be a performance difference using either method, it depends on your coding style really. If you put a <cfoutput> at the top and bottom of every page then using <cfloop> will work great. If you use multiple <cfoutput> and only place where they are needed that works as well.
I personally put <cfoutput> only where they are necessary, but I wouldn't say that's more correct than placing them at the top and bottom of the page.
I would like to add some code to my Application.cfc onRequestEnd function that, if a certain application variable flag is on, will log query sql and execution time to a database table. That part is relatively easy, since ColdFusion returns the sql and execution time as part of the query struct.
However, this site has probably close to 1000 pages, and modifying all of them just isn't realistic. So I'd like to do this completely programmatically in the onRequestEnd function. In order to do that I need to somehow get a list of all queries that have executed on the page and that's where I'm stumped.
How can I get a list of the names of all queries that have executed on the current page? These queries appear in the template's variables scope, but there are a myriad of other variables in there too and I'm not sure how to easily loop through that and determine which is a query.
Any help would be appreciated.
Since that information is available via the debugging templates, you might take a look at those files for some pointers.
Another thing to consider is encapsulating your queries in a CFC or custom tag and having that deal with the logging (but I suspect that your queries are spread all over the site so that might be a lot of pages to modify - although that speaks to why encapsulating data access is a good idea: it's easier to maintain and enhance for exactly this sort of situation).
The relevant code from the debug templates (modernized a bit), is:
<cfset tempFactory = createObject("java", "coldfusion.server.ServiceFactory") />
<cfset tempCfdebugger = tempFactory.getDebuggingService() />
<cfset qEvents = tempCfdebugger.getDebugger().getData() />
<cfquery dbType="query" name="qdeb">
SELECT *, (endTime - startTime) AS executionTime
FROM qEvents WHERE type = 'SqlQuery'
</cfquery>
I would like to be able to do a query of a query to UNION an unknown number of recordset. However when doing a query-of-query dots or brackets are not allowed in record set names.
For example this fails:
<cfquery name="allRecs" dbtype="query">
SELECT * FROM recordset[1]
UNION
SELECT * FROM recordset[2]
</cfquery>
Using dynamic variable names such as "recordset1" work but this is in a function and needs to be var-scoped so I can't build up the variable names dynamically without producing memory leaks in a persisted object.
Any other ideas?
After posting the question I came up with a couple solutions but there might be a better one out there
I could write dynamically named variables to the arguments scope and then reference them without their scope in query
Create a function that accepts 2 recordsets as arguments and returns one combined recordset. This could be looped over to progressively add a recordset at a time. I'm sure this is very inefficient compared to doing all UNIONs in one query though.
Difficult task. I could imagine a solution with a nested loop based on GetColumnNames(), using QueryAddRow() and QuerySetCell(). It won't be the most efficient one, but it is not really slow. Depends on the size of the task, of course.
Your "create a function that combines two recordsets" could be made much more efficient when you create it to accept, say, ten arguments. Modify the SQL on the fly:
<cfset var local = StructNew()>
<cfquery name="local.union" dbtype="query">
SELECT * FROM argument1
<cfloop from="2" to="#ArrayLen(arguments)#" index="local.i">
<cfif IsQuery(arguments[local.i])>
UNION
SELECT * FROM argument#local.i#
</cfif>
</cfloop>
</cfquery>
<cfreturn local.union>
After a quick bit of poking around, I found this:
queryConcat at CFLib.org. It uses queryaddrow/querysetcell to concatenate two queries.
I added a quick function (with no error checking, or data validation, so I wouldn't use it as-is):
<cffunction name="concatenate">
<cfset var result = arguments[1]>
<cfloop from="2" to="#arraylen(arguments)#" index="i">
<cfset result=queryconcat(result, arguments[i])>
</cfloop>
<cfreturn result>
</cffunction>
As a test, I threw this together:
Which does, in fact, give you fred/sammy/fred.
It's probably not the most efficient implementation, but you can always alter the insert/union code to make it faster if you wanted. Mostly, I was aiming to write as little code as possible by myself. :-)
all of the solutions added here should work for you, but I would also mention that depending on how much data you are working with and the database you are using, you might be better off trying to find a way to do this on the database side. With very large record sets, it might be beneficial to write the records to a temporary table and select them out again, but either way, if you can in any way rewrite the queries to let the database handle this in the first place you will be better off.