Need help optimizing this Django aggregate query - django

I have the following model
class Plugin(models.Model):
name = models.CharField(max_length=50)
# more fields
which represents a plugin that can be downloaded from my site. To track downloads, I have
class Download(models.Model):
plugin = models.ForiegnKey(Plugin)
timestamp = models.DateTimeField(auto_now=True)
So to build a view showing plugins sorted by downloads, I have the following query:
# pbd is plugins by download - commented here to prevent scrolling
pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total')
Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so.
I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object.
Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point?
EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.

That does seem unusually slow. There's nothing obvious in your query that would cause that slowness, though. I've done very similar queries in the past, with larger datasets, and they have executed in milliseconds.
The only suggestion I have for now is to install the Django debug toolbar, and in its SQL tab find the offending query and go to EXPLAIN to get the database to tell you exactly what it is doing when it executes. If it's doing subqueries, for example, check that they are using an index - if not, you may need to define one manually in the db. If you like, post the result of EXPLAIN here and I'll help further if possible.

Annotations are obviously slow, as they need to update every record in the db.
One direct way would be to denormalize the db field. Use a download_count field on the plugin models that is incremented on the new save of Download. Use the sort by the aggregate query on Plugins.
If you think there are going to be too many downloads to update another record of the Plugin all the time, you can update the download_count field on the Plugin via a cron.

Related

How to clean up after Sitecore template "Shared" setting was "packaged" and "installed" and items using this field are unaware

we faced very specific scenario in our Sitecore enviroment. In our Sitecore we have a item, lets call it "Promotion". Promotion was using "End date" field that was shared.
On our dev instance we "unshared" the field. Which naturally triggers the background process that changes the items to use field in unshared mode.
Similar process is described here: http://sitecoreblog.alexshyba.com/2011/10/changing-field-sharing-settings-in.html
So then we packaged and installed change of "unsharing field" on production "master" database. As I assume during installation the background process of "updating the items" has not been triggered. Which now behaves in the way, that "unshared" field on our production master database cannot be saved. Cahnges of value after clicking save are "vanishing". I am sure they are now being saved in some language agnostic mode.
Of course simple fix for that is to "share" it back and "unshare" it again. However when we tried to do this experiment on copy of our enviroment and we noticed all the values were lost. As the items from mentioned template are heavily used, we cannot really afford loosing those values.
Any ideas?
I would go "database digging". Sitecore stores these field values in their respective databases inside the "SharedFields", "VersionedFields" and "UnversionedFields" tables.
Assuming you shut off your Sitecore instances (this is important), you should be able to SELECT the data out of the wrong table, and INSERT it into the correct one.
(you need to look for items where FieldId matches the field you are having trouble with)
From what you've described, I don't believe Sitecore has removed any data on your production environment (yet).
So the solution we came up to, was to use Sitecore Rocks tool. We exported all the Items containing the fields before changing the field to "Share". The query was more or less like that:
SELECT ##ID, ##Start Date#, ##End Date# FROM //*[##templateid='{993DC54F-6724-46C3-B8D2-3EE13F15366A}']
It gave us proper values at that point, even though to items were pointing to the SharedFields table. We just simply converted the result of this query (around 9000 rows) in Excel to Sitecore Rocks update query -
UPDATE SET ##Start Date#='20120531T000000',##End Date#='20120614T000000' FROM //* [##ID='{E3FD9819-3DBD-4FAA-8DEF-FEF2A6272723}'];
After prepared this migrations script, we shared the appropriate field and apply the script of 9000 updates queries through Sitecore Rocks. We need to to exactly the same on Live database. Everything went pretty smooth.
The same approach could be easily done with the database I believe, however this solution was better for us, because of non-technical reasons (security policies etc.). Anyway Sitecore Rocks rocks!

Guidance on building simple django application for feature testing

I am currently building a simple site where users can login and write comments. Its somewhat similar to a forum but difference is that users will see image/an article and they get to comments on things based what they see. In addition they also get to rate what they see in 5 star scale or thumbs up thumbs down.
I have build this site using django now I want to add another application where I can control who gets to see the 5 star scale vs. thumbs up and thumbs down scale after they log in to the site. I want to randomly pic users as they come into the site and make sure half of the user population see one scale vs the other.
Here are my questions:
How should the random selection take place? Does django has inbuilt function or functionality to do such randomization?
Can someone provide simple example?
Here is my thought process on building such thing:
We create a model where we keep track of the userid, experimentGroup(5 Star or Thumbs up), Time
Then in each template we check if which group the user is in and based on that we adjust the users view?
I am new django so it would be great is someone provides a simple example so I can built upon that. After doing google search I found out a very heavy django application but I would rather user something simple that I can understand and control with my limited knowledge.
You can get objects in a random order by using (from django docs):
<YOUR_MODEL>.objects.order_by('?')
The exact queryset depends of how you define your models
Also django provide a template filter to random a list of items (see django docs)
Last, if above is not enough you can use python random module

What is the best way to schedule a view render as a task in django-chronograph or similar?

The use case is I want to statically render a view daily. It seems like there should be a pretty standard way to take a view/template and render static contents daily without simply saying "write a custom admin command" or a relatively simple command template that populates a static file.
The reason is to remove a large volume of database queries to make a site lightening quick, even on a lightweight vps by only touching the database daily instead of on every page view.
If there's a better way to do it, I'm open to that. It just seems like the best way to do it is rendering static views on a regular basis and cache-ing the crap out of it before it even touches django.
There are several ways I know to solve this:
1. You can use Varnish (as described in this blog post). Yet this solution takes a bit more time to get into because it's side technology you'll have to deal with. Also it takes more efforts to maintain it.
2. More "django-side" solution is to use django-celery for daily rendering your view and storing it in cache. You can move all your static view logic into task and render it there once a day. In your view you can just get rendered response from cache and return it to user.
3. Also you can use django per-view cache and create task in celery to clear cache daily.

How to I combine Page-views for a URL when they have different query strings in Google Analytics?

I am trying to do some reporting on page views on a site and the results are being listed like the following:
www.example.com/directory/ - 100 views
www.example.com/directory/?id=123456 - 10 views
www.example.com/directory/?id=987654 - 5 views
What filter do I need to create to views the results as:
www.example.com/directory/ - 100 views
www.example.com/directory/?id=* - 15 views
Thanks in advance
Yes, getting historical grouped together is going to mean using something like Google Docs, Excel, Tableau Software, Analytics Canvas, etc.
Moving forward...
One of the simplest ways of keeping things grouped in GA is to set up an advanced profile filter. You'll want to use this with a new profile; keeping a "raw" or "empty" profile is highly advisable for when you actually want to look at those individual URLs.
That said, here's a filter pattern that should work for you:
Go to Admin > Filters (under the View Column)
+ New Filter > Create new Filter > Name it
Filter Type = Custom filter > Advanced
Here's the pattern:
Field A: www\.example\.com\/directory\/\?id=.+
Output To: www\.example\.com\/directory\/\?id=\*
Another way to aggregate the same URI with multiple query strings is to change the primary dimension to 'Page Title' under Behavior > Site Content > All Pages.
The best way to do this for your historical data is unfortunately in an excel pivot table. You can get in in the UI, but only by creating a custom report and searching for very specific directories.
Check out the documentation on excluding query strings in your GA profile. Maybe create a new profile and write an advanced rule to rewrite all "id" pages to "/directory/product-page".
A totally different approach is to use custom variables or custom dimensions and to stop looking in the normal "Behavior" reports section (used to be called "Content" in GA) – custom dims are available using Google Analytics Universal Analytics only, which means starting a new web property and possibly running both code snippets concurrently (totally safe to do).
Personally I find custom dimensions a bit easier to work with than custom variables, and I generally think that it's a good idea to start exploring the new Google Analytics.
The nice thing about either of these approaches is that you can still keep the full page path date in the same profile as your custom dimension / variables information; it'll stay in the Behavior section where it belongs with all the other page paths.
Where I'm going with this...
You can create a new dimension such as "page type" and then call it "products", "posts", "articles", or whatever these id #s represent in this /directory/; then you can look at metrics across the dimension like pageviews, time on page, etc. by page type.
You can even create other dimensions to help describe them in more detail, such as breaking down blog posts or products into their different categories; i.e. hierarchical dimensions. Once you start using this kind of thing you may wonder what you ever did without it!
I think it's fair that I stop this answer now since it's not about how to set up custom variables or custom dimensions; those links should get you started (it's really not difficult).
Note: You can use php to fill in the dimension information in the GA tracking snippet dynamically based on the page that is being viewed (again, that's another question).

PL/SQL Access to Saved Report Data

Using 4.1 (latest version).
I have an Interactive Report page in my app. Users are free to create and save public and private reports setting any filter conditions they choose. What I need to do is loop through these reports and "process" some data based on a column value that matches the filter condition (something like an EMPLOYEE_ID).
What I would like to do is package this functionality into a PL/SQL procedure that is scheduled using DBMS_SCHEDULER.
Other than trying to reverse engineer this from the APEX views, I am stuck. Any help is greatly appreciated.
The bad news: there is no built-in way to get the query of an interactive report.
(I hope you can program PLSQL, otherwise you've hit a dead end.)
However, i have a package which does most of the work and is indeed processing the application metadata for IRs. It can handle both column and row filters, and also columns which have an lov laid on top of them. It doesn't handle computations or aggregates.
You'd have to take the code and adjust it somewhat though, as my goal was returning some data through json back to the browser, but you won't have to write the query-rebuilding part anymore. I'll refer you to my blog post i made about my package and why i made it, so that might clear up some of the usage of it for you. You can get the zip, and you'll need the APEX_IR package. (at time of writing, it still contains a stupid oversight in that it ignores the dis/enabled state of filters)