I have several hundred thousand svn commit record in my django database, each record save the related info of each commit(like BugID,LinesChanged,SubmitWeek ...)
I want to summary each field info of the records and create the report according to the SubmitWeek field like the following :
I iterate the records and operate the related field value currently , I want to know if there is a more succinct way to define the query and extract the summary? Many thanks
Your question is a bit vague.
If you are looking for a way to form your queries more specific to make Django do more joins and less separate queries, have a look at:
values() and values_list() of the QueryManager
If you want to make Django fetch related objects at once and not in separate queries, have a look at:
prefetch_related() and select_related()
If you want to update data more efficiently, have a look at:
F() https://docs.djangoproject.com/en/1.9/ref/models/expressions/#django.db.models.F
refer to the manual , I used the following statements and it seems works well , thanks Risadinha anyway :)
# Sum all the records's LinesChanged value
SVN_Commit.objects.filter(my filter).aggregate(Sum('LinesChanged'))
# Get the unique SubmitWeek List
SVN_Commit.objects.filter(my filter).values_list('SubmitWeek', flat=True).order_by('SubmitWeek').distinct()
Related
I am trying to build a database for my website. There are currently three entries with different attributes in my database. I have not created these entries in order, but I have assigned a 'Chapter number' attribute which indicates the order 1,2,3.
I am now trying to inject this using 'context' and 'render' function in my views. I am using the method 'objects.all()' to add all objects to my context. I have a simple Html file where I am inserting the data from the database by looping over (a simple for loop) these added objects.
Now the output that is being generated (naturally) is that it is following the order in which I created the database. I am not sure how I can have the loop run in such a way that I get these chapters in correct order. Thank you for patiently reading my question. Any help will be appreciated.
You may use the order_by method which is included in Djangos QuerySet API:
https://docs.djangoproject.com/en/3.0/ref/models/querysets/
If you offer some more information of your specific data I might provide you with an example.
For orientation purposes, sorting queried objects by date would work as follows:
most_recent = Entry.objects.order_by('-timestamp')
You can sort by any field like so:
sorted_by_field = Entry.objects.order_by('custom_field')
is their a way to create cumulative count using/customizing django database functions. this built-in query gets the number of items for each year. what if we need the number of items before that year ?
items.values('year').annotate(nb=Count('id'))
This functionality is built-in in django. You can combine order_by, values and annotate to get what you want:
Item.objects.order_by('year').values('year').annotate(nb=Count('id'))
For the official docs, see: aggregation. If the sample doesn't work I'll need more information about the model to give you the correct call. Please provide the full model and, if required, some sample data.
I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel
You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries
Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)
You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.
I have a 'categories' model which I is used more than once on a page. Since I am obtaining all the categories at the start, I want to cut down on database queries by obtaining the same data more than once.
Since the initial query is getting ALL the categories, is there a way to store this information in the model so that when I reference the data again later, I don't have to hit the database again?
Perhaps some kind of associative array or dict which stores the categories?
Any help would be appreciated.
Django querysets are lazy and cached, so the database is not hit till the queryset is accessed. You should also take a look at how queries are evaluated.
If you could post some code, we could help you figure out an optimal way to write queries.
That seems simple enough, but all Django Queries seems to be 'SELECT *'
How do I build a query returning only a subset of fields ?
In Django 1.1 onwards, you can use defer('col1', 'col2') to exclude columns from the query, or only('col1', 'col2') to only get a specific set of columns. See the documentation.
values does something slightly different - it only gets the columns you specify, but it returns a list of dictionaries rather than a set of model instances.
Append a .values("column1", "column2", ...) to your query
The accepted answer advising defer and only which the docs discourage in most cases.
only use defer() when you cannot, at queryset load time, determine if you will need the extra fields or not. If you are frequently loading and using a particular subset of your data, the best choice you can make is to normalize your models and put the non-loaded data into a separate model (and database table). If the columns must stay in the one table for some reason, create a model with Meta.managed = False (see the managed attribute documentation) containing just the fields you normally need to load and use that where you might otherwise call defer(). This makes your code more explicit to the reader, is slightly faster and consumes a little less memory in the Python process.