Remove (filter out) objects from queryset - django

I'd like to remove 3 objects from my queryset. This working with the help of an extra list, but im pretty sure there should be a better way to do this with the QuerySet API. However I didnt figure out how yet:
What I'm doing:
ranks = Rank.objects.all()
remove_ranks = ['Field Marshall', 'Military Attache', 'Mercenary Recruiter']
new_ranks =[]
for rank in ranks:
if not rank.name in remove_ranks:
new_ranks.append(rank)
How can I do this using the Django API ?

try
remove_ranks = ['Field Marshall', 'Military Attache', 'Mercenary Recruiter']
Rank.objects.exclude(name__in=remove_ranks)
what does it do?
.exclude is the opposite of .filter
name__in is the equivalent of a IN-Statement in SQL
This should produce a sql query something along the line
Select * from rank where name not in ('Field Marshall', 'Military Attache', 'Mercenary Recruiter')

Related

Raw query with rank over subquery / params not quoted

My Goal
I need PostgreSQL's rank() window function applied to an annotated queryset from Django's ORM. Django's sql query has to be a subquery in order to apply the window function and this is what I'm doing so far:
queryset = Item.objects.annotate(…)
queryset_with_rank = Items.objects.raw("""
select rank() over (order by points), *
from (%(subquery)s)""", { 'subquery': queryset.query }
)
The problem
Unfortunately, the query returned by queryset.query does not quote the parameters used for annotation correctly although the query itself is executed perfectly fine.
Example of returned query
The query returned by queryset_with_rank.query or queryset.query returns the following
"participation"."category" = )
"participation"."category" = amateur)
which I rather expected to be
"participation"."category" = '')
"participation"."category" = 'amateur')
Question
I noticed that the Django documentation states the following about Query.__str__()
Parameter values won't necessarily be quoted correctly, since that is done by the database interface at execution time.
As long as I fix the quotation manually and pass it to Postgres myself, everything works as expected. Is there a way to receive the needed subquery with correct quotation? Or is there an alternative and better approach to applying a window function to a Django ORM queryset altoghether?
As Django core developer Aymeric Augustin said, there's no way to get the exact query that is executed by the database backend beforehand.
I still managed to build the query the way I hoped to, although a bit cumbersome:
# Obtain query and parameters separately
query, params = item_queryset.query.sql_with_params()
# Put additional quotes around string. I guess this is what
# the database adapter does as well.
params = [
'\'{}\''.format(p)
if isinstance(p, basestring) else p
for p in params
]
# Cast list of parameters to tuple because I got
# "not enough format characters" otherwise. Dunno why.
params = tuple(params)
participations = Item.objects.raw("""
select *,
rank() over (order by points DESC) as rank
from ({subquery}
""".format(subquery=query.format(params)), []
)

Nested SQL queries in Django

I've got a working SQL query that I'm trying to write in Django (without resorting to RAW) and was hoping you might be able to help.
Broadly, I'm looking to next two queries - the first calculates a COUNT, and then I'm looking to calculate an AVERAGE of the COUNTS. (this'll give you the average number of items on a ticket, per location)
The SQL that works is:
SELECT location_name, Avg(subq.num_tickets) FROM (
SELECT Count(ticketitem.id) AS num_tickets, location.name AS location_name
FROM ticketitem
JOIN ticket ON ticket.id = ticketitem.ticket_id
JOIN location ON location.id = ticket.location_id
JOIN location ON location.id = location.app_location_id
GROUP BY ticket_id, location.name) AS subq
GROUP BY subq.location_name;
For my Django code, I'm trying something like this:
# Get the first count
qs = TicketItem.objects.filter(<my complicated filter>).\
values('ticket__location__app_location__name','posticket').\
annotate(num_tickets=Count('id'))
# now get the average of the count
qs2 = qs.values('ticket__location__app_location__name').\
annotate(Avg('num_tickets')).\
order_by('location__app_location__name')
but that fails because num_tickets doesn't exist ... Anyway - suspect I'm being slow. Would love someone to enlighten me!
Check out the section on aggregating annotations from the Django docs. Their example takes an average of a count.
I was playing around with this a bit in a manage.py shell, and I think the django ORM might not be able to do that kind of annotation. Honestly you're probably going to have to resort to doing a raw query or bind in something like https://github.com/Deepwalker/aldjemy which would let you do that via SQLAlchemy.
When I playing with this I tried
(my_model.objects.filter(...)
.values('parent_id', 'parent__name', 'thing')
.annotate(Count('thing'))
.values('name', 'thing__count')
.annotate(Avg('thing__count')))
Which gave a lovely traceback about FieldError: Cannot compute Avg('thing__count'): 'thing__count' is an aggregate, which makes sense since I doubt the ORM is trying to convert that first group by to a nested query.

Django distinct foreign keys for use in another model

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tried the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
id_list is empty unless I remove the order_by() but then it's not in the correct order.
entries = Log.objects.filter(id__in=id_list)
The objects in entries is not in the same order as in id_list because when you use Mysql function IN() it will not sort the result by the input order ... How to do it in django?
It looks like it is impossible to achieve what you want with django orm. Documentation states that is not possible to use order_by along with distinct.
However there might be another way to solve it. Maybe you could select Project objects, and annotate them with latest log entries.
Here's a single-query solution (but it will probably be too slow):
Log.objects.filter(project__log__date__gte=F('date')).annotate(c=Count('project__log')).filter(c__lte=4).order_by('project', 'c')
I think that Skirmantas is right and you have to do it in a more complex way:
from django.db.models import Max
projects = Project.objects.annotate(last_logged=Max('log__date')).order_by('-last_logged')[:4]
log_entries = [proj.log_set.order_by('-date')[0] for proj in projects]

fast lookup for the last element in a Django QuerySet?

I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.