Is there a way to filter and get only specific columns?
For example, get all entries with the column first_name.
QuerySet.values() or QuerySet.values_list(), e.g.:
Entry.objects.values('first_name')
If you want a list of only the values, use:
Entry.objects.values_list('first_name', flat=True)
To only get a column's values from the table but still return an object of that model, use only:
record = Entry.objects.only('first_name')
This will defer all other columns from the model but you can still access them all normally.
record.first_name # already retrieved
record.last_name # retrieved on call
Related
I get a queryset object every time i want some data from models.
So when i say,
"items = Items.object.get(value=value)"
I get --
"<QuerySet [<Item-name>]>"
I have to iterate through the queryset object to get the data and I do that with
"items[0]"
Is there any way I can avoid this?
Edit: I meant "items = Items.object.filter(value=value)"
first of all items = Items.objects.get(value=value) does not return a queryset,
rather it returns an object of <Items: Items object (1)>
To get the first(or just one result) or last date from the object, do this Items.objects.first() or Items.objects.last()
To get the desired data without using its index position, then you can filter it like this Items.objects.filter(value=value)
You are mistaken. items = Items.object.get(value=value) will not give you a queryset, but an object. items = Items.object.filter(value=value)
would give you a queryset.
Filter method will always give you a queryset, because; in order to minimize the need of database hits, django considers you might add additional filters through your code. So if you not execute that queryset, e.g. by using list(your_queryset) django never hits the database.
# when you are using 'get' in your query, you don't need to iterate, directly get an access to the field values
try:
items = Items.object.get(value=value)
except Items.DoesNotExist:
items = None
if items:
print(items.value)
Consider I have model called DemoModel which consists of field named demo_field of type Char Field.
Now i have to query on the demo_field which has null values and 'demo'.
Example:
Class DemoModel(models.Model):
demo_field = models.CharField(max_lenght=20)
I have tried to query like
DemoModel.objects.filter(demo_field__in=[None,"demo"])
but i am able to get only records which having value as "demo".
I want to query it in single line is it possible? or anyother ways
from django.db.models import Q
DemoModel.objects.filter(Q(demo__isnull=True) | Q(demo="demo"))
I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object
I have a query like this:
file_s = Share.objects.filter(shared_user_id=log_id)
Now, I want to get the files_id attribute from file_s in Django view. How can I do that?
Use values() to get particular attribute which will return you list of dicts, like
file_s = Share.objects.filter(shared_user_id=log_id).values('files_id')
EDIT: If you want only one attribute then you can use flat=True to suggest to return just list of values. However, make sure in what order the list will be.
file_s = Share.objects.filter(shared_user_id=log_id).values_list('files_id', flat=True).order_by('id')
Your Share.objects.filter() call returns a Djagno QuerySet object which is not a single record, but an iterable of filtered objects from the database with each item being a Share instance. It's possible that your filter call will return more than one item.
You can iterate over the QuerySet using a loop such as:
for share in files_s:
print share.files_id
If you know that your query is only expected to return a single item, you could do:
share = Share.objects.get(shared_user_id=log_id)
which will return a single Share instance from which you can access the files_id attribute. An exception will be raised if the query would return anything other than 1 result.
If you want to get only the value from a single value QuerySet you can do:
file_s = Share.objects.filter(shared_user_id=log_id).values_list('files_id', flat=True).first()
I have a model that has four fields. How do I remove duplicate objects from my database?
Daniel Roseman's answer to this question seems appropriate, but I'm not sure how to extend this to situation where there are four fields to compare per object.
Thanks,
W.
def remove_duplicated_records(model, fields):
"""
Removes records from `model` duplicated on `fields`
while leaving the most recent one (biggest `id`).
"""
duplicates = model.objects.values(*fields)
# override any model specific ordering (for `.annotate()`)
duplicates = duplicates.order_by()
# group by same values of `fields`; count how many rows are the same
duplicates = duplicates.annotate(
max_id=models.Max("id"), count_id=models.Count("id")
)
# leave out only the ones which are actually duplicated
duplicates = duplicates.filter(count_id__gt=1)
for duplicate in duplicates:
to_delete = model.objects.filter(**{x: duplicate[x] for x in fields})
# leave out the latest duplicated record
# you can use `Min` if you wish to leave out the first record
to_delete = to_delete.exclude(id=duplicate["max_id"])
to_delete.delete()
You shouldn't do it often. Use unique_together constraints on database instead.
This leaves the record with the biggest id in the DB. If you want to keep the original record (first one), modify the code a bit with models.Min. You can also use completely different field, like creation date or something.
Underlying SQL
When annotating django ORM uses GROUP BY statement on all model fields used in the query. Thus the use of .values() method. GROUP BY will group all records having those values identical. The duplicated ones (more than one id for unique_fields) are later filtered out in HAVING statement generated by .filter() on annotated QuerySet.
SELECT
field_1,
…
field_n,
MAX(id) as max_id,
COUNT(id) as count_id
FROM
app_mymodel
GROUP BY
field_1,
…
field_n
HAVING
count_id > 1
The duplicated records are later deleted in the for loop with an exception to the most frequent one for each group.
Empty .order_by()
Just to be sure, it's always wise to add an empty .order_by() call before aggregating a QuerySet.
The fields used for ordering the QuerySet are also included in GROUP BY statement. Empty .order_by() overrides columns declared in model's Meta and in result they're not included in the SQL query (e.g. default sorting by date can ruin the results).
You might not need to override it at the current moment, but someone might add default ordering later and therefore ruin your precious delete-duplicates code not even knowing that. Yes, I'm sure you have 100% test coverage…
Just add empty .order_by() to be safe. ;-)
https://docs.djangoproject.com/en/3.2/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Transaction
Of course you should consider doing it all in a single transaction.
https://docs.djangoproject.com/en/3.2/topics/db/transactions/#django.db.transaction.atomic
If you want to delete duplicates on single or multiple columns, you don't need to iterate over millions of records.
Fetch all unique columns (don't forget to include the primary key column)
fetch = Model.objects.all().values("id", "skuid", "review", "date_time")
Read the result using pandas (I did using pandas instead ORM query)
import pandas as pd
df = pd.DataFrame.from_dict(fetch)
Drop duplicates on unique columns
uniq_df = df.drop_duplicates(subset=["skuid", "review", "date_time"])
## Dont add primary key in subset you dumb
Now, you'll get the unique records from where you can pick the primary key
primary_keys = uniq_df["id"].tolist()
Finally, it's show time (exclude those id's from records and delete rest of the data)
records = Model.objects.all().exclude(pk__in=primary_keys).delete()