How to Quickly Search a Query

How to Quickly Search a Query - coldfusion

I have an application where almost everything is dynamic. I am creating an edit form for a user and essentially need to search a query to select a group of checkboxes.
I have a table assigning the user to programs that holds userid and programid which maps to the corresponding records in the users table and the programs table. Initially I grab one user and all the programs and I loop over the programs query to build the checkboxes.
<cfloop query="Rc.programs">
<dd><input type="checkbox" name="programs" value="#Rc.programs.id#" /> #Rc.programs.name#</dd>
</cfloop>
What I ideally want to do is pull all records in the program memberships table and do some sort of search through that. I could do a query of queries, but I was wondering if there was a faster way to essentially search a query. My query of queries would be like the following if this helps people understand.
SELECT * FROM Rc.programs WHERE programid = #Rc.programs.id#

QoQ is certainly the easiest way to do it, but don't forget your CFQUERYPARAM:
SELECT * FROM Rc.programs WHERE programid =
<cfqueryparam value="#Rc.programs.id#" cfsqltype="WHATEVER_IT_IS">
You can also reference an individual column/field of a query as an array, and search through just that column using array functions, including arrayFind() (which might just be in recent versions).
arrayFind( Rc.programs.programId, YOUR_ID_HERE )
If that's not fast enough you could always build some sort of data structure or index in memory, and keep it around in an Application-scope variable if such is appropriate.
But is your database really that slow? Reducing the number of queries executed by a page is almost always a good thing, but for average, uncomplicated queries you probably won't be able to beat the speed, caching, etc of your DB server.

Related

Redshift - Redesign tables to use DIST and SORT keys (performance issue)

I'm having serious performance problems on Redshift and I've started to rethink my tables structures.
Right now, I'm identifying tables that have most significance on my dashboard. First of all, I run the following query:
SELECT * FROM admin.v_extended_table_info
WHERE table_id IN (
SELECT DISTINCT s.tbl FROM stl_scan s
JOIN pg_user u ON u.usesysid = s.userid
WHERE s.type=2 AND u.usename='looker'
)
ORDER BY SPLIT_PART("scans:rr:filt:sel:del",':',1)::int DESC,
size DESC;
Based on query result, I could identify a lot of small tables (1-1000 records) that are distributed as EVEN and it could be ALL - this tables are used in a lot of joins instructions.
Beside that, I've identified that 99% of my tables are using EVEN without sort key. I'm not using denormalized tables so I need to run plenty of joins to get data - for what I've read, EVEN is not good for joins because it could be distributed over the network.
I have 3 tables related to Ticket flow: user, ticket and ticket_history. All those tables are EVEN without sort keys and diststyle as EVEN.
For now, I would like to redesign table user: this table is used on join by condition ticket.user_id = user.id and where clauses like user.email = 'xxxx#xxxx.com' or user.email like '%#something.com%' or group by user.email.
First thing I'm planning to do is use diststyle as distribution and key as id. Does make sense use a unique value as dist key? I've read plenty of posts about dist keys and still confuse for me.
As sort keys makes sense use email as compound? I've read to avoid columns that grows like dates, timestamps or identities, that's why i'm not using it as interleaved. To avoid that like, I'm planning to create a new column to identify what is email domain.
After that, I'll change small tables to dist ALL and try my queries again.
Am I on right way? Any other tip?
This question could sound stupid but my tech background is only software development, I'm learning about Redshift and reading a lot of documentations.

The basic rule of thumb is:
Set the DISTKEY to the column that is most used in JOINs
Set the SORTKEY to the column(s) most used in WHEREs
You are correct that small tables can have a distribution of ALL, which would avoid sending data between nodes.
DISTKEY provides the most benefit when tables are join via a common column that has the same DISTKEY in both tables. This means that each row is contained on the same node and no data needs to be sent between nodes (or, more accurately, slices). However, you can only select one DISTKEY, so do it on the column that is most often used for the JOIN.
SORTKEY provides the most benefit when Redshift can skip over blocks of storage. Each block of storage contains data for one column and is marked with a MIN and MAX value. When a table is sorted on a particular column, it minimises the number of disk blocks that contain data for a given column value (since they are all located together, rather than being spread randomly throughout disk storage). Thus, use column(s) that are most frequently used in WHERE statements.
If the user.email wildcard search is slow, you can certainly create a new column with the domain. Or, for even better performance, you could consider creating a separate lookup table with just user_id and domain, having SORTKEY = domain. This will perform the fastest when searching by domain.
A tip from experience: I would advise against using an email address as a user_id because people sometimes want to change email address. It is better to use a unique number for such id columns, with email address as a changeable attribute. (I've seen software systems need major rewrites to fix such an early design decision!)

Working with ColdFusion - Looping a procedure

I'm wondering if there is a better way to do what I'm doing. It works but I feel there should be a better way. If my query result in 20K records for example, I'm getting "The request has exceeded the allowable time limit Tag: CFQUERY"
<cfquery name="GetMyRecords" datasource="MyDSN">
SELECT idnumber,PrefAddr,...more colums
FROM um_valid
WHERE userid = <cfqueryparam cfsqltype="cf_sql_varchar"
value="#session.userid#">
AND session_id = <cfqueryparam cfsqltype="cf_sql_numeric"
value="#session.Session_Id#">
AND status NOT IN (<cfqueryparam cfsqltype="cf_sql_varchar"
value="X,C">)
</cfquery>
I also have an existing store procedure that expect some values from the query to do what it's supposed to do. So I'm looping it like this:
<cfloop query="GetMyRecords">
<cfstoredproc procedure="MyProc" datasource="MyDSN">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#id_number" value="#Trim(idnumber)#">
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"
dbvarname="#Aaddr_pref_ind" value="#Trim(PrefAddr)#">
----- still more params to be passed----
</cfstoredproc>
</cfloop>
Does ColdFusion has a better technique to avoid either time out error or 500 error?

Like another poster mentioned, reducing the number of database calls should be a priority to you. I suggest joining data (if possible) in your first query rather than looping your query and querying again.
To fix the time issue, you can put requestTimeout in your page to override the default timeout. The time is in seconds.
<cfsetting requestTimeOut = "600">
See this explanation.

Your current approach is to make 1 query, which contains n records. Then loop over that record set, calling a query for each record. This results in your calling n + 1 queries per request. As the volume of data returned by your first query increases, so too does the volume of overall queries made to the database. In your case, you're trying to make 20,001 calls to the database in a single request.
Ideally, you need a solution that involves one call to a stored procedure with a properly optimized query that can return all of your data in a single record set. If you don't need to dump all the data onto the page at the same time, then create a paginated query that will return x number of records per page. Then the user can go page by page through the query or provide a search form with additional filters to allow the user to reduce the overall size of the records returned.

Need to see full queries to give you an example, but in general, this is a really useful thing in CF to help you out!
You need to look into the attribute called group. This lets you specify a column to group your output by. This actually will eliminate the need for the stored proc you are calling entirely.
The way it works is letting you create sub-set outputs based on grouping. So for example, you could do this with your id's and output that group.
You can also have multiple of these and have header and footer sections for each one for display purposes or just logic manipulation.
This lets you query the entire dataset and then manipulate it in the loop without having subqueries which is ultra inefficient and cringe worthy.
This is something quite unique to ColdFusion, check it out!
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7ff6.html

Is it bad practice to use cfquery inside cfloop?

I am having an array of structure. I need to insert all the rows from that array to a table.
So I have simply used cfquery inside cfloop to insert into the database.
Some people suggested me not to use cfquery inside cfloop as each time it will make a new connection to the database.
But in my case Is there any way I can do this without using cfloop inside cfquery?

Its not so much about maintaining connections as hitting the server with 'n' requests to insert or update data for every iteration in the cfloop. This will seem ok with a test of a few records, but then when you throw it into production and your client pushes your application to look around a couple of hundred rows then you're going to hit the database server a couple of hundred times as well.
As Scott suggests you should see about looping around to build a single query rather than the multiple hits to the database. Looping around inside the cfquery has the benefit that you can use cfqueryparam, but if you can trust the data ie. it has already been sanatised, you might find it easier to use something like cfsavecontent to build up your query and output the string inside the cfquery at the end.

I have used both the query inside loop and loop inside query method. While having the loop inside the query is theoretically faster, it is not always the case. You have to try each method and see what works best in your situation.
Here is the syntax for loop inside query, using oracle for the sake of picking a database.
insert into table
(field1, field2, etc)
select null, null, etc
from dual
where 1 = 2
<cfloop>
union
select <cfqueryparam value="#value1#">
, <cfqueryparam value="#value2#">
etc
from dual
</cfloop>

Depending on the database, convert your array of structures to XML, then pass that as a single parameter to a stored procedure.
In the stored procedure, do an INSERT INTO SELECT, where the SELECT statement selects data from the XML packet. You could insert hundreds or thousands of records with a single INSERT statement this way.
Here's an example.

There is a limit to how many <CFQUERY><cfloop>... iterations you can do when using <cfqueryparam>. This is also vendor specific. If you do not know how many records you will be generating, it is best to remove <cfqueryparam>, if it is safe to do so. Make sure your data is coming from trusted sources & is sanitised. This approach can save huge amounts of processing time, because it is only make one call to the database server, unlike an outer loop.

Is it possible to cache a ColdFusion Query of Query

I've got query that is returning all translations for a site. It does this by getting all translations that are in the users desired language, then the remaining that are in the site default language, then any other strings that have not been translated. I'm using cachedwithin on that query since the data doesn't change often, and I'm resetting that queries cache if translations are modified. I'm then using ColdFusion's Query of Query to get the individual record that I'm after. This has increased performance considerably.
I was wondering if it's possible to further cache the Query of Query query to further increase performance. It appears to work as page load is 1/6 faster, however are there any gotchas with this technique?
The Query of Query is below.
<cfquery name="qryTranslation" dbtype="query">
SELECT
TranslationString
FROM
qryGetText
WHERE
TranslationHash = <cfqueryparam value="#StringHash#" cfsqltype="cf_sql_varchar">
AND DesiredLanguageID = <cfqueryparam value="#Arguments.LanguageID#" cfsqltype="cf_sql_bigint">
</cfquery>

Is it possible to cache a ColdFusion Query of Query
Yes, it is possible.
however are there any gotchas with this technique?
You queries will be cached based on its signature, so in your case the StringHash and Arguments.LanguageID. If you have a cached QofQ for every translation on a page, on many pages on your site, then you could potentially max out the "Maximum number of cached queries" value. If this happens other, potentially larger and more important, cached queries in the query cache could be evicted.
Calculating a suitable "Maximum number of cached queries" could be determined by load testing and using the build in server monitor to monitor the number of queries in the cache.

There is one big gotcha with caching a query of query.
The documentation for caching a query states that:
To use cached data, the current query must use the same SQL statement, data source, query name, user name, and password.
However a Query of Query does not have a data source, user name or password, so you lose a lot of "over cache" protection. The query as it stands in your question will conflict with any other queries on your server that have the same name and formatting. So if you have more than one website that uses this code then the first website that is loaded will dictate the translations used on the rest of the websites.
A quick way around this is to trick the query into being more constrained.
<cfquery name="qryTranslation" dbtype="query">
SELECT
TranslationString
FROM
qryGetText
WHERE
TranslationHash = <cfqueryparam value="#StringHash#" cfsqltype="cf_sql_varchar">
AND DesiredLanguageID = <cfqueryparam value="#Arguments.LanguageID#" cfsqltype="cf_sql_bigint">
AND '#Variables.DSN#' = '#Variables.DSN#'
</cfquery>
Change Variables.DSN to be the value of the datasource attribute in the main query. If you don't trust that variable, then also make it a cfqueryparam on both sides of the operator.

How can one get a list of all queries that have run on a page in ColdFusion 9

I would like to add some code to my Application.cfc onRequestEnd function that, if a certain application variable flag is on, will log query sql and execution time to a database table. That part is relatively easy, since ColdFusion returns the sql and execution time as part of the query struct.
However, this site has probably close to 1000 pages, and modifying all of them just isn't realistic. So I'd like to do this completely programmatically in the onRequestEnd function. In order to do that I need to somehow get a list of all queries that have executed on the page and that's where I'm stumped.
How can I get a list of the names of all queries that have executed on the current page? These queries appear in the template's variables scope, but there are a myriad of other variables in there too and I'm not sure how to easily loop through that and determine which is a query.
Any help would be appreciated.

Since that information is available via the debugging templates, you might take a look at those files for some pointers.
Another thing to consider is encapsulating your queries in a CFC or custom tag and having that deal with the logging (but I suspect that your queries are spread all over the site so that might be a lot of pages to modify - although that speaks to why encapsulating data access is a good idea: it's easier to maintain and enhance for exactly this sort of situation).
The relevant code from the debug templates (modernized a bit), is:
<cfset tempFactory = createObject("java", "coldfusion.server.ServiceFactory") />
<cfset tempCfdebugger = tempFactory.getDebuggingService() />
<cfset qEvents = tempCfdebugger.getDebugger().getData() />
<cfquery dbType="query" name="qdeb">
SELECT *, (endTime - startTime) AS executionTime
FROM qEvents WHERE type = 'SqlQuery'
</cfquery>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js