Is it possible to cache a ColdFusion Query of Query - coldfusion

I've got query that is returning all translations for a site. It does this by getting all translations that are in the users desired language, then the remaining that are in the site default language, then any other strings that have not been translated. I'm using cachedwithin on that query since the data doesn't change often, and I'm resetting that queries cache if translations are modified. I'm then using ColdFusion's Query of Query to get the individual record that I'm after. This has increased performance considerably.
I was wondering if it's possible to further cache the Query of Query query to further increase performance. It appears to work as page load is 1/6 faster, however are there any gotchas with this technique?
The Query of Query is below.
<cfquery name="qryTranslation" dbtype="query">
SELECT
TranslationString
FROM
qryGetText
WHERE
TranslationHash = <cfqueryparam value="#StringHash#" cfsqltype="cf_sql_varchar">
AND DesiredLanguageID = <cfqueryparam value="#Arguments.LanguageID#" cfsqltype="cf_sql_bigint">
</cfquery>

Is it possible to cache a ColdFusion Query of Query
Yes, it is possible.
however are there any gotchas with this technique?
You queries will be cached based on its signature, so in your case the StringHash and Arguments.LanguageID. If you have a cached QofQ for every translation on a page, on many pages on your site, then you could potentially max out the "Maximum number of cached queries" value. If this happens other, potentially larger and more important, cached queries in the query cache could be evicted.
Calculating a suitable "Maximum number of cached queries" could be determined by load testing and using the build in server monitor to monitor the number of queries in the cache.

There is one big gotcha with caching a query of query.
The documentation for caching a query states that:
To use cached data, the current query must use the same SQL statement, data source, query name, user name, and password.
However a Query of Query does not have a data source, user name or password, so you lose a lot of "over cache" protection. The query as it stands in your question will conflict with any other queries on your server that have the same name and formatting. So if you have more than one website that uses this code then the first website that is loaded will dictate the translations used on the rest of the websites.
A quick way around this is to trick the query into being more constrained.
<cfquery name="qryTranslation" dbtype="query">
SELECT
TranslationString
FROM
qryGetText
WHERE
TranslationHash = <cfqueryparam value="#StringHash#" cfsqltype="cf_sql_varchar">
AND DesiredLanguageID = <cfqueryparam value="#Arguments.LanguageID#" cfsqltype="cf_sql_bigint">
AND '#Variables.DSN#' = '#Variables.DSN#'
</cfquery>
Change Variables.DSN to be the value of the datasource attribute in the main query. If you don't trust that variable, then also make it a cfqueryparam on both sides of the operator.

Related

Working with ColdFusion - Looping a procedure

I'm wondering if there is a better way to do what I'm doing. It works but I feel there should be a better way. If my query result in 20K records for example, I'm getting "The request has exceeded the allowable time limit Tag: CFQUERY"
<cfquery name="GetMyRecords" datasource="MyDSN">
SELECT idnumber,PrefAddr,...more colums
FROM um_valid
WHERE userid = <cfqueryparam cfsqltype="cf_sql_varchar"
value="#session.userid#">
AND session_id = <cfqueryparam cfsqltype="cf_sql_numeric"
value="#session.Session_Id#">
AND status NOT IN (<cfqueryparam cfsqltype="cf_sql_varchar"
value="X,C">)
</cfquery>
I also have an existing store procedure that expect some values from the query to do what it's supposed to do. So I'm looping it like this:
<cfloop query="GetMyRecords">
<cfstoredproc procedure="MyProc" datasource="MyDSN">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#id_number" value="#Trim(idnumber)#">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#Aaddr_pref_ind" value="#Trim(PrefAddr)#">
----- still more params to be passed----
</cfstoredproc>
</cfloop>
Does ColdFusion has a better technique to avoid either time out error or 500 error?
Like another poster mentioned, reducing the number of database calls should be a priority to you. I suggest joining data (if possible) in your first query rather than looping your query and querying again.
To fix the time issue, you can put requestTimeout in your page to override the default timeout. The time is in seconds.
<cfsetting requestTimeOut = "600">
See this explanation.
Your current approach is to make 1 query, which contains n records. Then loop over that record set, calling a query for each record. This results in your calling n + 1 queries per request. As the volume of data returned by your first query increases, so too does the volume of overall queries made to the database. In your case, you're trying to make 20,001 calls to the database in a single request.
Ideally, you need a solution that involves one call to a stored procedure with a properly optimized query that can return all of your data in a single record set. If you don't need to dump all the data onto the page at the same time, then create a paginated query that will return x number of records per page. Then the user can go page by page through the query or provide a search form with additional filters to allow the user to reduce the overall size of the records returned.
Need to see full queries to give you an example, but in general, this is a really useful thing in CF to help you out!
You need to look into the attribute called group. This lets you specify a column to group your output by. This actually will eliminate the need for the stored proc you are calling entirely.
The way it works is letting you create sub-set outputs based on grouping. So for example, you could do this with your id's and output that group.
You can also have multiple of these and have header and footer sections for each one for display purposes or just logic manipulation.
This lets you query the entire dataset and then manipulate it in the loop without having subqueries which is ultra inefficient and cringe worthy.
This is something quite unique to ColdFusion, check it out!
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7ff6.html

Django model count() with caching

I have an Django application with Apache Prometheus monitoring and model called Sample.
I want to monitor Sample.objects.count() metric
and cache this value for concrete time interval
to avoid costly COUNT(*) queries in database.
From this tutorial
https://github.com/prometheus/client_python#custom-collectors
i read that i need to write custom collector.
What is best approach to achieve this?
Is there any way in django to
get Sample.objects.count() cached value and update it after K seconds?
I also use Redis in my application. Should i store this value there?
Should i make separate thread to update Sample.objects.count() cache value?
First thing to note is that you don't really need to cache the result of a count(*) query.
Though different RDBMS handle count operations differently, they are slow across the board for large tables. But one thing they have in common is that there is an alternative to SELECT COUNT(*) provided by the RDBMS which is in fact a cached result. Well sort of.
You haven't mentioned what your RDBMS is so let's see how it is in the popular ones used wtih Django
mysql
Provided you have a primary key on your table and you are using MyISAM. SELECT COUNT() is really fast on mysql and scales well. But chances are that you are using Innodb. And that's the right storage engine for various reasons. Innodb is transaction aware and can't handle COUNT() as well as MyISAM and the query slows down as the table grows.
the count query on a table with 2M records took 0.2317 seconds. The following query took 0.0015 seconds
SELECT table_rows FROM information_schema.tables
WHERE table_name='for_count';
but it reported a value of 1997289 instead of 2 million but close enough!
So you don't need your own caching system.
Sqlite
Sqlite COUNT(*) queries aren't really slow but it doesn't scale either. As the table size grows the speed of the count query slows down. Using a table similar to the one used in mysql, SELECT COUNT(*) FROM for_count required 0.042 seconds to complete.
There isn't a short cut. The sqlite_master table does not provide row counts. Neither does pragma table_info
You need your own system to cache the result of SELECT COUNT(*)
Postgresql
Despite being the most feature rich open source RDBMS, postgresql isn't good at handling count(*), it's slow and doesn't scale very well. In other words, no different from the poor relations!
The count query took 0.194 seconds on postgreql. On the other hand the following query took 0.003 seconds.
SELECT reltuples FROM pg_class WHERE relname = 'for_count'
You don't need your own caching system.
SQL Server
The COUNT query on SQL server took 0.160 seconds on average but it fluctuated rather wildly. For all the databases discussed here the first count(*) query was rather slow but the subsequent queries were faster because the file was cached by the operating system.
I am not an expert on SQL server so before answering this question, I didn't know how to look up the row count using schema info. I found this Q&A helpfull. One of them I tried produced the result in 0.004 seconds
SELECT t.name, s.row_count from sys.tables t
JOIN sys.dm_db_partition_stats s
ON t.object_id = s.object_id
AND t.type_desc = 'USER_TABLE'
AND t.name ='for_count'
AND s.index_id = 1
You dont' need your own caching system.
Integrate into Django
As can be seen, all databases considered except sqlite provide a built in 'Cached query count' There isn't a need for us to create one of our own. It's a simple matter of creating a customer manager to make use of this functionality.
class CustomManager(models.Manager):
def quick_count(self):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("""SELECT table_rows FROM information_schema.tables
WHERE table_name='for_count'""")
row = cursor.fetchone()
return row[0]
class Sample(models.Model):
....
objects = CustomManager()
The above example is for postgresql, but the same thing can be used for mysql or sql server by simply changing the query into one of those listed above.
Prometheus
How to plug this into django prometheus? I leave that as an exercise.
A custom collector that returns the previous value if it's not too old and fetches otherwise would be the way to go. I'd keep it all in-process.
If you're using MySQL you might want to look at the collectors the mysqld_exporter offers as there's some for table size that should be cheaper.

Speed up QoQ's or an alternative approach?

I am building an application that performs a master query with many joins. This query data is then available to the whole application to play around with in a global variable. The query refreshes or gets the latest result set on each page refresh; so it's only in the same state for the life of the request.
In other parts of this application, I sometimes run 100's of QoQ's on this data - usually the result of recursive function calls. However, while QoQ is a great feature, it's not too fast and sometimes page loads can be between 3000 - 5000 ms on a bad day. It's just not fast enough.
Is there any kind of optimisation techniques I can do to make QoQ perform faster or perhaps an alternative method? I read an interesting article by Ben Nadel on Duplicate() function - is there any scope for using that and if so, how?
I would love to hear your thoughts.
Don't worry about crazy suggestions, this is a personal project so I'm willing to take risks. I'm running this on Railo compatible with CF8.
Many thanks,
Michael.
Without seeing the code and complexity of the QoQs it is hard to say for sure the best approach, however one thing you can do is use a struct to index the records outside of a QoQ. Much of the overhead of using QoQ is building new query objects, and using a struct write only approach is much more efficient than for example looping over the original query and making comparisons.
For example:
<!--- build up index --->
<cfset structindex = {} />
<cfset fields = "first,last,company" />
<cfloop list="#fields#" index="field">
<cfset key = "field:#field#,value:#q[field][currentrow]#" />
<!--- initialize each key (instead of using stuctkeyexists) --->
<cfloop query="q">
<cfset structindex[key] = "" />
</cfloop>
<cfloop query="q">
<!--- update each key with list of matching row indexes --->
<cfset structindex[key] = listappend(structindex[key], currentrow) />
</cfloop>
</cfloop>
<!--- save structindex to global variable --->
<!--- output rows matching index --->
<cfset key = "field:company,value:stackexchange" />
<cfoutput>
<cfloop list="#structindex[key]#" index="row">
#q.last[row]#, #q.first[row]# (#q.company[row]#)<br />
</cfloop>
</cfoutput>
If this doesn't match your need provide some examples of the QoQ statements and how many records are in the main query.
First, I would look at the time taken by the master query. If it can be cached for some mount of time and is taking a good chunk of the pageload time, I would cache it.
Next, I would look at the recursive calls. If they can be made iterative, that would probably speed things up. I realize this is not always possible. I would be surprised if this isn't your biggest time sink. without knowing more about what you are doing, though, it's hard to help you optimize this.
I might also consider writing some of the recursive QoQs s stored procedures on the DB server, which is designed to handle data quickly and slice and dice efficiently. CF is not -- QoQs are very useful, but not speed demons (as you've noted).
Finally, I would look for straightfoward filters, and not use QoQ. Rather, I would just run a loop over the master query in a standard cfoutput tag, and filter on the fly. This means you are looping over the master query once, rather than the master query once and the result query once.
There are two primary solutions here. First you could do something in CF with the records outside of QoQ. I posted my suggestion on this already. The other is to do everything in the db. One way I've found to do this is to use a subquery as a temp table. You can even keep the sql statement in a global variable and then reference it in the same places you are currently with the QoQ but doing a real query to the database. It may sound slower than one trip tothe DB and then many QoQ but in reality it probably isn't if indexed efficiently.
select *
from (
#sqlstring#
) as tmp
where company = 'stackexchange'
I have actually done this for system with complex criteria for both what records a user should have access to and then also what they can filter for in those records. Going with this approach means you always know the source of the inner records instead of trying to ensure every single query is pulling correctly.
Edit:
It is actually safer (and usually more efficient) to use queryparams when ever possible. I found this can be done by including a file of the sql statement...
select *
from (
<cfinclude template="master_subquery.cfm" />
) as tmp
where company = 'stackexchange'

How to Quickly Search a Query

I have an application where almost everything is dynamic. I am creating an edit form for a user and essentially need to search a query to select a group of checkboxes.
I have a table assigning the user to programs that holds userid and programid which maps to the corresponding records in the users table and the programs table. Initially I grab one user and all the programs and I loop over the programs query to build the checkboxes.
<cfloop query="Rc.programs">
<dd><input type="checkbox" name="programs" value="#Rc.programs.id#" /> #Rc.programs.name#</dd>
</cfloop>
What I ideally want to do is pull all records in the program memberships table and do some sort of search through that. I could do a query of queries, but I was wondering if there was a faster way to essentially search a query. My query of queries would be like the following if this helps people understand.
SELECT * FROM Rc.programs WHERE programid = #Rc.programs.id#
QoQ is certainly the easiest way to do it, but don't forget your CFQUERYPARAM:
SELECT * FROM Rc.programs WHERE programid =
<cfqueryparam value="#Rc.programs.id#" cfsqltype="WHATEVER_IT_IS">
You can also reference an individual column/field of a query as an array, and search through just that column using array functions, including arrayFind() (which might just be in recent versions).
arrayFind( Rc.programs.programId, YOUR_ID_HERE )
If that's not fast enough you could always build some sort of data structure or index in memory, and keep it around in an Application-scope variable if such is appropriate.
But is your database really that slow? Reducing the number of queries executed by a page is almost always a good thing, but for average, uncomplicated queries you probably won't be able to beat the speed, caching, etc of your DB server.

How can one get a list of all queries that have run on a page in ColdFusion 9

I would like to add some code to my Application.cfc onRequestEnd function that, if a certain application variable flag is on, will log query sql and execution time to a database table. That part is relatively easy, since ColdFusion returns the sql and execution time as part of the query struct.
However, this site has probably close to 1000 pages, and modifying all of them just isn't realistic. So I'd like to do this completely programmatically in the onRequestEnd function. In order to do that I need to somehow get a list of all queries that have executed on the page and that's where I'm stumped.
How can I get a list of the names of all queries that have executed on the current page? These queries appear in the template's variables scope, but there are a myriad of other variables in there too and I'm not sure how to easily loop through that and determine which is a query.
Any help would be appreciated.
Since that information is available via the debugging templates, you might take a look at those files for some pointers.
Another thing to consider is encapsulating your queries in a CFC or custom tag and having that deal with the logging (but I suspect that your queries are spread all over the site so that might be a lot of pages to modify - although that speaks to why encapsulating data access is a good idea: it's easier to maintain and enhance for exactly this sort of situation).
The relevant code from the debug templates (modernized a bit), is:
<cfset tempFactory = createObject("java", "coldfusion.server.ServiceFactory") />
<cfset tempCfdebugger = tempFactory.getDebuggingService() />
<cfset qEvents = tempCfdebugger.getDebugger().getData() />
<cfquery dbType="query" name="qdeb">
SELECT *, (endTime - startTime) AS executionTime
FROM qEvents WHERE type = 'SqlQuery'
</cfquery>