How do I tell if a coldfusion query is caching? - coldfusion

I have set caching on a query that is getting run a lot of times.. the query itself is not all that slow, but it get's run many times for each request, so figured caching may help. I have enabled caching, but doesn't really seem to be making a difference.. how do I tell if my query is being cached or not?
I'm setting caching with : q.setCachedWithin("#createTimespan(0, 1, 0, 0)#");
Here is my full query preperation:
q = New Query();
q.setSQL("SELECT * FROM guest_booking WHERE room_id = :roomID and check_in <= :iDate and check_out > :iDate and status != 0");
q.setName("checkAvailability");
q.setCachedWithin("#createTimespan(0, 1, 0, 0)#");
q.addParam(name="iDate", value="#createODBCDate(arguments.date)#", cfsqltype="cf_sql_date");
q.addParam(name="roomID", value="#createODBCDate(arguments.room_id)#", cfsqltype="cf_sql_integer");
qResult = q.execute().getresult();
Debug output is showing:
checkAvailability (Datasource=accom_crm, Time=16ms, Records=1) in C:\ColdFusion9\CustomTags\com\adobe\coldfusion\base.cfc # 16:15:56.056
SELECT * FROM guest_booking WHERE room_id =
?
and check_in <=
?
and check_out >
?
and status != 0
Query Parameter Value(s) -
Parameter #1(cf_sql_integer) = 56
Parameter #2(cf_sql_date) = {ts '2011-11-14 00:00:00'}
Parameter #3(cf_sql_date) = {ts '2011-11-14 00:00:00'}
Many thanks in advance..
Jason
EDIT AFTER SHAWN'S ANSWER BELOW
Have changed the following two lines of query preperation:
query name is now different for differ query.. created dynamically from paramaters passed in
q.setName("check#arguments.room_id##DateFormat(arguments.date,'ddmmyy')#");
createTimeSpan removed from quotes, so not passed in as a string.
q.setCachedWithin(createTimespan(0, 1, 0, 0));
I have also tried sending through an unprepared query (not using addparam(), but just rendering the variables straight in the query string), but made no difference..
EDIT 2 AFTER SHAWN'S 3rd EDITANWSER BELOW
Shawn.. nice pickup on edit 3!!! you have isolated where the problem is. (anyone reading this, quick, up vote Shawn's answer, he found a needle in a hay stack)
Passing the dates in as params does not cache..e.g..
q.setSQL("SELECT booking_id FROM guest_booking WHERE room_id = :roomID and check_in <= :iDate and check_out > :iDate and status != 0");
q.addParam(name="iDate", value="#createODBCDate(arguments.date)#", cfsqltype="cf_sql_date");
Just passing it in as a variable does not cache..e.g..
q.setSQL("SELECT booking_id FROM guest_booking WHERE room_id = :roomID and check_in <= #createODBCDate(arguments.date)# and check_out > #createODBCDate(arguments.date)# and status != 0");
BUT hard coding the dates DOES cache..e.g..
q.setSQL("SELECT booking_id FROM guest_booking WHERE room_id = :roomID and check_in <= {ts '2011-12-16 00:00:00'} and check_out > {ts '2011-12-16 00:00:00'} and status != 0");
This is all good, but clearly I can't hard code the dates... The dates will change for each day obviously, but even where I run the same query with the same dates being passed in dynamically (query syntax being exactly the same), the query won't cache if the dates are passed in as variables.. only if they are hard coded into the query.. wierd.. will keep playing and see what I can find.
Thank you Shawn for pinpointing the problem !!!

If it was correctly cached, your debug output would include just a tiny bit of additional info, to the tune of:
checkAvailability (Datasource=accom_crm, Time=0ms, Records=1, Cached Query)
Something is preventing your query from being cached.
I notice your call to .setCachedWithin() has a string being passed to it-or rather, you're making it a string by qualifying it with quotes, and using # signs.
Try passing the actual value returned from CreateTimeSpan(), without converting it to a string, like so:
q.setCachedWithin(createTimeSpan(0, 1, 0, 0));
-- edit --
Some other tidbits to note about query caching:
The name of the query must be the same.
The SQL statement (in all its parameterized forms) must be the same.
The datasource must be the same.
If used, the username & password must be the same.
The DBTYPE must be the same.
All these attributes must remain the same from call to call in order for ColdFusion to consider it a cache-able query. You mentioned above that you tried taking the addParam() calls out, but still had no luck...
...try using a static query name, rather than one with variables--see if you get any further
q.setName("checkTestQuery");
-- 2nd edit --
Another often overlooked issue is the ColdFusion server's clock. Make sure the CFServer's date/time is set correctly. This may sound silly, but I've seen many a "production" server whose clock was completely off, and not set to the correct timezone, let alone time...and time, of course, is of great significance within the context of caching.
-- 3rd edit --
After re-reading and reviewing everything, I'm going to recommend you take one more look at the 2nd point I made above regarding the SQL statement needing to be the same, and your WHERE clause being dependent upon a variable that is influenced by date/time, which implicitly could change on every request.
...and, since the SQL statement must remain the same in order to be cached, CF discards any attempt to cache it.
Try restructuring the SQL statement temporarily, without the WHERE clause looking for those date variables...and see what it produces.

Shawn's advice is mostly good, but the easiest way to check if the query is being cached is to update the underlying data and see if requerying gives you the previously-cached (all going well) data, or reflects the updates. If it reflects the updates, then it's not being cached...
Note that you don't need to do that horsing around with the query name (as per your update). CF will work out by itself when the parameters are different, and cache separate result sets.

Related

Django ORM Issue: Same Query Set on 2 different views runs as drastically different speeds

I have Two views--one part of the admin site, and the other a publicly accessible view.
They both perform the same set of queries--literally copy and pasted code.
masterQuery = myObject.objects.filter(is_public=True)
newQuery = queriedForms.filter(ref_to_parent_form__record_reference__form_name__icontains=term['TVAL'], ref_to_parent_form__record_reference_type__pk=rtypePK)
newQuery = newQuery.filter(flagged_for_deletion=False)
term['count'] = newQuery.count()
masterQuery = (newQuery & masterQuery)
singleQueryStats['intersections'] = masterQuery.count()
Each view has this exact same code--it's not the prettiest query--but regardless: On the admin view--this runs in less than like a 1/4 second. On the public Views.py view--it takes 8 minutes. I cannot figure out why. The queryset.query output is the same. The variables(admin submitted through POST/Public submitted through GET) also match.
EDITS: I tried simplifying things further to no avail:
SELECT `maqluengine_form`.`id`, `maqluengine_form`.`form_name`, `maqluengine_form`.`form_number`, `maqluengine_form`.`form_geojson_string`, `maqluengine_form`.`hierarchy_parent_id`, `maqluengine_form`.`is_public`, `maqluengine_form`.`project_id`, `maqluengine_form`.`date_created`, `maqluengine_form`.`created_by_id`, `maqluengine_form`.`date_last_modified`, `maqluengine_form`.`modified_by_id`, `maqluengine_form`.`sort_index`, `maqluengine_form`.`form_type_id`, `maqluengine_form`.`flagged_for_deletion` FROM `maqluengine_form` WHERE (`maqluengine_form`.`form_type_id` = 319 AND `maqluengine_form`.`flagged_for_deletion` = False)
this is the query output on both views--the admin view takes <1/4 second and the public view takes about 4-8 minutes depending to perform a count() operation on this queryset
There is no logic that could be changing the time--the timer server error log prints match up until the count is performed.
Neither queryset is evaluated before the count--just built. Still at an utter loss here.
I was being an idiot--the two querysets weren't the same--there was an additional booleanfield being hit, and I was reading the logs wrong--The answer is that the querysets WERE NOT the same--so that answers this.
I submitted a new question to figure out the drastic speed difference between the two.

Total number of documents in pysolr

How can I get the total number of documents matching the given query. I have use the query below:
result = solr.search('ad_id : 20')
print(len(result))
Since the default returning value is '10', the output is only 10, but the count is 4000. How can I get the total number of counts?
The results object from pysolr has a hits property that contains the total number of hits, regardless of how many documents being returned. This is named numFound in the raw response from Solr.
Your solution isn't really suitable for anything with a larger dataset, since it requires you to retrieve all the documents, even if you don't need them or want to show their content.
The count is stored in numFound variable. Use the code below:
result = solr.search('ad_id : 20')
print(result.raw_response['response']['numFound'])
As #MatsLindh mentioned -
result = solr.search('ad_id : 20')
print(result.hits)
Finally got the answer:
Added rows=1000000 at the end of the query.
result = solr.search('ad_id : 20', rows=1000000)
But if the rows are greater than this the number should be changed in the query. This might be a bad solution but works.
If anyone has a better solution please do reply.
If you just want the total number of items that satisfy your query, here is my Python3 code (using the pysolr module):
collection='bookindex' # or whatever your collection is called
solr_url = f"http://{SOLR_HOST}/solr/{collection}"
solr = pysolr.Solr(url=solr_url, timeout=120, always_commit=True)
result = solr.search("*:*", rows=0);
return result.hits
This queries for all documents (":") -- 315913 in my case -- but you can narrow that to suit your requirements. For example, if I want to know how many of my book entries have title:pandas I can search("title:pandas", rows=0) and get 41 as the number that have pandas in the title. By setting rows=0 you're letting Solr know that it need not format any results for you but you just return the meta information, and thus much more efficient than setting a high limit on rows.

Writing an activerecord statement to query a datetime column

I have a profiles table with a column name videoconfavailability which is a datetime type. I am trying to make an Ajax button_tag to search all of the videoconfavailability 1 hour from Time.now and 1 hour before Time.now.
so far I have this line here, is there a NOT clause to filter out other conditions?
Profile.where("videoconfavailability <= ? AND videoconfavailability >= ?", Time.now + 1.hour , Time.now - 1.hour )
The end goal here is to have ALL of the time available 1 hour before current time and 1 hour after current time.
Is this going to work?
It looks good to me - maybe you can explain the issue you are having with it!
In terms of using it you might consider creating a scope tied to your Profile model. If you need to use it with different time ranges perhaps it could accept arguments. If not then just hardcode it to make it easier to use again and again with repeating yourself.
scope :available_within_time, -> (start, end) { where("videoconfavailability <= ? AND videoconfavailability >= ?", start, end )
As it's written in your question perhaps consider making use of the Rails time helpers to make things even more concise.
1.hour.ago
1.hour.from_now
If you want to filter out negative matches to check some other condition you can chain a .where.not(QUERY) onto your existing query.

I'm confused about how distinct() works with Django queries

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

Retrieving unique results in Django queryset based on column contents

I am not sure if the title makes any sense but here is the question.
Context: I want to keep track of which students enter and leave a classroom, so that at any given time I can know who is inside the classroom. I also want to keep track, for example, how many times a student has entered the classroom. This is a hypothetical example that is quite close to what I want to achieve.
I made a table Classroom and each entry has a Student (ForeignKey), Action (enter,leave), and Date.
My question is how to get the students that are currently inside (ie. their enter actions' date is later than their leave actions' date, or don't have a leave date), and how to specify a date range to get the students that were inside the classroom at that time.
Edit: On better thought I should also add that there are more than one classrooms.
my first attempt was something like this:
students_in = Classroom.objects.filter(classroom__exact=1, action__exact='1')
students_out = Classroom.objects.filter(classroom__exact=1, action__exact='0').values_list('student', flat=True)
students_now = students_in.exclude(student__in=students_out)
where if action == 1 is in, 0 is out.
This however provides the wrong data as soon as a student leaves a classroom and re-enters. She is listed twice in the students_now queryset, as there are two 'enters' and one 'leave'. Also, I can't check upon specific date ranges to see which students have an entry date that is later than their leave date.
To check a field based on the value of another field, use the F() operator.
from django.db.models import F
students_in_classroom_now = Student.objects.filter(leave__gte=F('enter'))
To get all students in the room at a certain time:
import datetime
start_time = datetime.datetime(2010, 1, 21, 10, 0, 0) # 10am yesterday
students_in_classroom_then = Student.objects.filter(enter__lte=start_time,
leave__gte=start_time)
Django gives you the Q() and F() operators, which are very powerful and enough for most of the situations. However I don't think that it will be enough for you. Let's think about your problem at the SQL level.
We have something like a table Classroom ( action, ts, student_id ). In order to know which students are at the classroom right now, we would have to make something like:
with ( /* temporary view with last user_action */
select action, max(ts) xts, student_id
from Classroom
group by action, student_id
) as uber_table
select a.student_id student_id
from uber_table a, uber_table b
where a.action = 'enter'
/* either he entered and never left */
and (a.student_id not in (select student_id from uber_table where action = 'leave')
/* or he left before he entered again, so he's still in */
or (a.student_id = b.student_id and b.action = 'leave' and b.xts < a.xts))
This is, I believe, standard SQL. However, if you're using SQLite or MySQL as database backends (most likely you are), then stuff like the WITH keyword for creating temporary views probably isn't supported and the query will just have to get even more complex. There may be a simpler version but I don't really see it.
My point here is that when you get to this level of complexity, F() and Q() become inadequate tools for the job, so I'd rather recommend that you write the SQL code by hand and use Raw SQL in Django.
Should you need to use the more common data access APIs, you should probably rewrite your data model in the way #Daniel Roseman implied.
By the way, a query for getting people that were inside the classroom in the same interval is just like that one, but all you have to do is limit the last leave ts to the beginning of the interval and the last enter ts to the end of the interval.