Django find all rows matching 2 columns criteria - django

Imagine the model Event like this
name
email
A
u1#example.org
B
u1#example.org
B
u1#example.org
C
u2#example.org
B
u3#example.org
B
u3#example.org
A
u4#example.org
B
u4#example.org
I would like to find all emails that contain name A and B. In my example ["u1#example.org", "u4#example.org"]
Today I'm doing
emails = [
e["email"]
for e in models.Event.objects.filter(name__in=["A", "B"])
.values("email")
.annotate(count=Count("id"))
.order_by()
.filter(count__gt=1)
]
It's not working because I'm also getting duplicates of emails containing only one name (like u3#example.org).

After trying different approach, I found the solution
events = ["A", "B"]
emails = [
e["email"]
for e in models.Event.objects.filter(name__in=events)
.values("email")
.annotate(count_name=Count("name", distinct=True))
.order_by()
.filter(count_name=len(events))
]
I need to group by email and count number of distinct name and filter by count equals to my number of events.

If you don't need the model, there is this option, that yields the expected result:
from django.db import connection
def get_random_events(request):
cursor = connection.cursor()
cursor.execute("SELECT DISTINCT email FROM event WHERE name = 'A' OR 'B'")
for row in cursor:
print(row[0])
return render(request, 'blank.html')
As for the ORM the problem is the last part of the query, it does not seem possible to properly build the WHERE clause. My best attempt was using Q lookups, still...the same problem:
RandomEvent.objects.values('email').distinct().filter(Q(name='B') | Q(name='A'))
# Query Structure
SELECT DISTINCT email FROM random_event WHERE (name = 'B' OR name = 'A')

Related

Django Query to get count of all distinct values for column of ArrayField

What is the most efficient way to count all distinct values for column of ArrayField.
Let's suppose I have a model with the name MyModel and cities field which is postgres.ArrayField.
#models.py
class MyModel(models.Model):
....
cities = ArrayField(models.TextField(blank=True),blank=True,null=True,default=list) ### ['mumbai','london']
and let's suppose our MyModel has the following 3 objects with cities field value as follow.
1. ['london','newyork']
2. ['mumbai']
3. ['london','chennai','mumbai']
Doing a count on distinct values for cities field does on the entire list instead of doing on each element.
## Query
MyModel.objects.values('cities').annotate(Count('id')).order_by().filter(id__count__gt=0)
Here I would like to count distinct values for cities field on each element of the list of cities field.which should give the following final output.
[{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}]
perform the group by operation in the database level itself.
from django.db import connection
cursor = connection.cursor()
raw_query = """
select unnest(subquery_alias.cities) as distinct_cities, count(*) as cities_group_by_count
from (select cities from sample_mymodel) as subquery_alias group by distinct_cities;
"""
cursor.execute(raw_query)
result = [{"city": row[0], "count": row[1]} for row in cursor]
print(result)
References
unnest()-postgress array function
Django: Executing custom SQL directly
Doing it with an in-efficient way out of Django syllabus:
unique_cities = list(data.values_list('cities',flat=True))
unique_cities_compiled = list(itertools.chain.from_iterable(unique_cities ))
unique_cities_final = {unique_cities_compiled .count(i) for i in unique_cities_compiled }
print(unique_cities_final )
{'london':2},{'newyork':1},{'chennai':1},{'mumbai':2}
if anyone does in much efficient way, do drop the answer for the improvised version of the solution.

Django query ForeignKey Count() zero

I have 3 tables:
Truck with the fields: id, name....
Menu with the fields: id, itemname, id_foodtype, id_truck...
Foodtype with the fields: id, type...
I want to get a summary like:
id name total
10 Alcoholic drink 0
5 Appetizer 11
My problem is to return the results with 0 elements.
I tried an SQL query like this:
SELECT
ft.id, ft.name, COUNT(me.id) total
FROM
foodtype ft LEFT JOIN menu me
ON ft.id = me.id_foodtype
LEFT JOIN truck tr
ON tr.id = me.id_truck AND tr.id = 3
GROUP BY ft.id, ft.name
ORDER BY ft.name
or a query in Django
Menu.objects.filter(id_truck=3).values("id_foodtype").annotate(cnt=Count("id_foodtype"))
But, neither is displaying the results with Zero elements.
At the moment to convert this query to Python code, any of my queries return the exact result that I expected.
How can I return results with the Left Join including the foodtypes with zero elements in the menu?
The direction of LEFT JOIN depends on the object, where you start the query. If it start on Menu you will never see a FoodType unused by selected Menu items. Then is important to filter (by Truck in your case) such way that also null value Menu.id is allowed in order to can get Count == 0.
from django.db.models import Q
qs = (
FoodType.objects
.filter(Q(menu_set__id_truck=3) | Q(menu_set__id__isnull=True))
.values() # not necessary, but useful if you want a dict, not a Model object
.annotate(cnt=models.Count("menu_set__id"))
)
Verify:
>>> print(str(qs.query))
SELECT foodtype.id, foodtype..., COUNT(menu.id) AS cnt
FROM foodtype
LEFT OUTER JOIN menu ON (foodtype.id = menu.id_foodtype)
WHERE _menu.id_truck = 3 OR menu.id IS NULL)
GROUP BY foodtype.id
It works with the current newest and oldest Django 2.0b1 and 1.8.
The query is the same with or without the line .values(). The results are dictionaries or FoodType objects with a cnt attribute.
Footnotes:
The name menu_set should be replaced by the real related_name of foreign key id_foodtype if you have defined the related_name.
class Menu(models.Model):
id_foodtype = models.ForeignKey('FoodType', on_delete=models.DO_NOTHING,
db_column='id_foodtype', related_name='menu_set'))
...
If you start a new project I recommend to rename the foreign key to a name without "id" and the db_column field is with "id". Then menu_item.foodtype is a Food object and menu_item.id_foodtype its id.

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

django annotate count filter

I am trying to count daily records for some model, but I would like the count was made only for records with some fk field = xy so I get list with days where there was a new record created but some may return 0.
class SomeModel(models.Model):
place = models.ForeignKey(Place)
note = models.TextField()
time_added = models.DateTimeField()
Say There's a Place with name="NewYork"
data = SomeModel.objects.extra({'created': "date(time_added)"}).values('created').annotate(placed_in_ny_count=Count('id'))
This works, but shows all records.. all places.
Tried with filtering, but it does not return days, where there was no record with place.name="NewYork". That's not what I need.
It looks as though you want to know, for each day on which any object was added, how many of the objects created on that day have a place whose name is New York. (Let me know if I've misunderstood.) In SQL that needs an outer join:
SELECT m.id, date(m.time_added) AS created, count(p.id) AS count
FROM myapp_somemodel AS m
LEFT OUTER JOIN myapp_place AS p
ON m.place_id = p.id
AND p.name = 'New York'
GROUP BY created
So you can always express this in Django using a raw SQL query:
for o in SomeModel.objects.raw('SELECT ...'): # query as above
print 'On {0}, {1} objects were added in New York'.format(o.created, o.count)
Notes:
I haven't tried to work out if this is expressible in Django's query language; it may be, but as the developers say, the database API is "a shortcut but not necessarily an end-all-be-all.")
The m.id is superfluous in the SQL query, but Django requires that "the primary key ... must always be included in a raw query".
You probably don't want to write the literal 'New York' into your query, so pass a parameter instead: raw('SELECT ... AND p.name = %s ...', [placename]).

Django: filter a RawQuerySet

i've got some weird query, so i have to execute raw SQL. The thing is that this query is getting bigger and bigger and with lots of optional filters (ordering, column criteria, etc.).
So, given the this query:
SELECT DISTINCT Camera.* FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
This is roughly the Python code:
def get_cameras(features):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
return Camera.objects.raw(query, tuple(features))
This is working great, but i need to add more filters and ordering, for example suppose i need to filter by color and order by price, it starts to grow:
#extra_filters is a list of tuples like:
# [('price', '=', '12'), ('color' = 'blue'), ('brand', 'like', 'lum%']
def get_cameras_big(features,extra_filters=None,order=None):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
if extra_filters:
query += " WHERE "
for ef in extra_filters:
query += "%s %s %s" % ef #not very safe, refactoring needed
if order:
query += "order by %s" % order
return Camera.objects.raw(query, tuple(features))
So, i don't like how it started to grow, i know Model.objects.raw() returns a RawQuerySet, so i'd like to do something like this:
queryset = get_cameras( ... )
queryset.filter(...)
queryset.order_by(...)
But this doesn't work. Of course i could just perform the raw query and after that get the an actual QuerySet with the data, but i will perform two querys. Like:
raw_query_set = get_cameras( ... )
camera.objects.filter(id__in(raw_query_set.ids)) #don't know if it works, but you get the idea
I'm thinking that something with the QuerySet init or the cache may do the trick, but haven't been able to do it.
.raw() is an end-point. Django can't do anything with the queryset because that would require being able to somehow parse your SQL back into the DBAPI it uses to create SQL in the first place. If you use .raw() it is entirely on you to construct the exact SQL you need.
If you can somehow reduce your query into something that could be handled by .extra() instead. You could construct whatever query you like with Django's API and then tack on the additional SQL with .extra(), but that's going to be your only way around.
There's another option: turn the RawQuerySet into a list, then you can do your sorting like this...
results_list.sort(key=lambda item:item.some_numeric_field, reverse=True)
and your filtering like this...
filtered_results = [i for i in results_list if i.some_field == 'something'])
...all programatically. I've been doing this a ton to minimize db requests. Works great!
I implemented Django raw queryset which supports filter(), order_by(), values() and values_list(). It will not work for any RAW query but for typical SELECT with some INNER JOIN or a LEFT JOIN it should work.
The FilteredRawQuerySet is implemented as a combination of Django model QuerySet and RawQuerySet, where the base (left part) of the SQL query is generated via RawQuerySet, while WHERE and ORDER BY directives are generared by QuerySet:
https://github.com/Dmitri-Sintsov/django-jinja-knockout/blob/master/django_jinja_knockout/query.py
It works with Django 1.8 .. 1.11.
It also has a ListQuerySet implementation for Prefetch object result lists of model instances as well, so these can be processed the same way as ordinary querysets.
Here is the example of usage:
https://github.com/Dmitri-Sintsov/djk-sample/search?l=Python&q=filteredrawqueryset&type=&utf8=%E2%9C%93
Another thing you can do is that if you are unable to convert it to a regular QuerySet is to create a View in your database backend. It basically executes the query in the View when you access it. In Django, you would then create an unmanaged model to attach to the View. With that model, you can apply filter as if it were a regular model. With your foreign keys, you would set the on_delete arg to models.DO_NOTHING.
More information about unmanaged models:
https://docs.djangoproject.com/en/2.0/ref/models/options/#managed