How to get all the duplicated records in django? - django

I have a model which contains various fields like name, start_time, start_date. I want to create an API that reads my model and displays all the duplicated records.

The above answer works, but I just wanna post my code as well so that other newbies won't have to look for other answers.
from django.db.models import Count
def find_duplicates(request):
if request.method == 'GET':
duplicates = Model.objects.values('field1', 'field2', 'field3')\
.annotate(field1_count=Count('field1'),
field2_count=Count('field2'),
field3_count=Count('field3')
) \
.filter(field1_count__gt=1,
field2_count__gt=1,
field3_count__gt=1
)
duplicate_objects = Model.objects.filter(field1__in=[item['field1'] for item in duplicates],
field2__in=[item['field2'] for item in duplicates],
field3__in=[item['field3'] for item in duplicates],
)
serializer = ModelSerializer(duplicate_objects, many=True)
return Response(serializer.data, status=status.HTTP_200_OK)
P.S. field1 is a foreign key therefore I am extracting the id here.

As far as I know, this needs to be done as a two step process. You find the fields that have duplicates in one query, and then gather all those objects in a second query.
from django.db.models import Count
duplicates = MyModel.objects.values('name') \
.annotate(name_count=Count('id')) \
.filter(name_count__gt=1)
duplicate_objects = MyModel.objects.filter(name__in=[item['name'] for item in duplicates])
duplicates will contain the names that have more than one occurrence, while duplicate_objects will contain all duplicate named objects.

Related

How to filter in django with empty fields when using ChoiceField

I have a programme where users should be able to filter different types of technologies by their attributes. My question is, how would I filter the technologies when there's potential conflicts and empty values in the parameters I use to filter?
Forms.py:
class FilterDataForm(forms.ModelForm):
ASSESSMENT = (('', ''),('Yes', 'Yes'),('No', 'No'),)
q01_suitability_for_task_x = forms.ChoiceField(label='Is the technology suitable for x?',
choices=ASSESSMENT, help_text='Please select yes or no', required=False,)
q02_suitability_for_environment_y = forms.ChoiceField(label='Is the technology suitable for environment Y?',
choices=ASSESSMENT, help_text='Please select yes or no', required=False)
There are many fields in my model like the ones above.
views.py
class TechListView(ListView):
model = MiningTech
def get_queryset(self):
q1 = self.request.GET.get('q01_suitability_for_task_x', '')
q2 = self.request.GET.get('q02_suitability_for_environment_y', '')
object_list = MiningTech.objects.filter(q01_suitability_for_task_x=q1).filter(
q02_suitability_for_environment_y=q2)
return object_list
The difficulty is that not all technology db entries will have data. So in my current setup there's times where I will filter out objects that have one attribute but not another.
For instance if my db has:
pk1: q01_suitability_for_task_x=Yes; q02_suitability_for_environment_y=Yes;
pk2: q01_suitability_for_task_x=Yes; q02_suitability_for_environment_y='';
In the form, if I don't select any value for q01_suitability_for_task_x, and select Yes for q02_suitability_for_environment_y, I get nothing back in the queryset because there are no q01_suitability_for_task_x empty fields.
Any help would be appreciated. I'm also ok with restructuring everything if need be.
The problem is that your self.request.GET.get(...) code defaults to an empty string if there is no value found, so your model .filter() is looking for matches where the string is ''.
I would restructure the first part of get_queryset() to build a dictionary that can be unpacked into your filter. If the value doesn't exist then it doesn't get added to the filter dictionary:
filters = {}
q1 = self.request.GET.get('q01_suitability_for_task_x', None)
q2 = self.request.GET.get('q02_suitability_for_environment_y', None)
if q1 is not None:
filters['q01_suitability_for_task_x'] = q1
... etc ...
object_list = MiningTech.objects.filter(**filters)
If you have a lot of q1, q2, etc. items then consider putting them in a list, looping through and inserting into the dictionary if .get(...) returns anything.
Edit: Because there are indeed a lot possible filters, the final solution looks as follows:
def get_queryset(self):
filters = {}
for key, value in self.request.GET.items():
if value != '':
filters[key] = value
object_list = Tech.objects.filter(**filters)

How to create a customized filter search function in Django?

I am trying to create a filter search bar that I can customize. For example, if I type a value into a search bar, then it will query a model and retrieve a list of instances that match the value. For example, here is a view:
class StudentListView(FilterView):
template_name = "leads/student_list.html"
context_object_name = "leads"
filterset_class = StudentFilter
def get_queryset(self):
return Lead.objects.all()
and here is my filters.py:
class
StudentFilter(django_filters.FilterSet):
class Meta:
model = Lead
fields = {
'first_name': ['icontains'],
'email': ['exact'],
}
Until now, I can only create a filter search bar that can provide a list of instances that match first_name or email(which are fields in the Lead model). However, this does now allow me to do more complicated tasks. Lets say I added time to the filter fields, and I would like to not only filter the Lead model with the time value I submitted, but also other Lead instances that have a time value that is near the one I submitted. Basically, I want something like the def form_valid() used in the views where I can query, calculate, and even alter the values submitted.
Moreover, if possible, I would like to create a filter field that is not necessarily an actual field in a model. Then, I would like to use the submitted value to do some calculations as I filter for the list of instances. If you have any questions, please ask me in the comments. Thank you.
You can do just about anything by defining a method on the filterset to map the user's input onto a queryset. Here's one I did earlier. Code much cut down ...
The filter coat_info_contains is defined as a CharFilter, but it is further parsed by the method which splits it into a set of substrings separated by commas. These substrings are then used to generate Q elements (OR logic) to match a model if the substring is contained in any of three model fields coating_1, coating_2 and coating_3
This filter is not implicitly connected to any particular model field. The connection is through the method= specification of the filter to the filterset's method, which can return absolutely any queryset on the model that can be programmed.
Hope I haven't cut out anything vital.
import django_filters as FD
class MemFilter( FD.FilterSet):
class Meta:
model = MyModel
# fields = [fieldname, ... ] # default filters created for these. Not required if all declarative.
# fields = { fieldname: [lookup_expr_1, ...], ...} # for specifying possibly multiple lookup expressions
fields = {
'ft':['gte','lte','exact'], 'mt':['gte','lte','exact'],
...
}
# declarative filters. Lots and lots of
...
coat_info_contains = FD.CharFilter( field_name='coating_1',
label='Coatings contain',
method='filter_coatings_contains'
)
...
def filter_coatings_contains( self, qs, name, value):
values = value.split(',')
qlist = []
for v in values:
qlist.append(
Q(coating_1__icontains = v) |
Q(coating_2__icontains = v) |
Q(coating_3__icontains = v) )
return qs.filter( *qlist )

Django - annotate against a prefetch QuerySet?

Is it possible to annotate/count against a prefetched query?
My initial query below, is based on circuits, then I realised that if a site does not have any circuits I won't have a 'None' Category which would show a site as Down.
conn_data = Circuits.objects.all() \
.values('circuit_type__circuit_type') \
.exclude(active_link=False) \
.annotate(total=Count('circuit_type__circuit_type')) \
.order_by('circuit_type__monitor_priority')
So I changed to querying sites and using prefetch, which now has an empty circuits_set for any site that does not have an active link. Is there a Django way of creating the new totals against that circuits_set within conn_data? I was going to loop through all the sites manually and add the totals that way but wanted to know if there was a way to do this within the QuerySet instead?
my end result should have a something like:
[
{'circuit_type__circuit_type': 'Fibre', 'total': 63},
{'circuit_type__circuit_type': 'DSL', 'total': 29},
{'circuit_type__circuit_type': 'None', 'total': 2}
]
prefetch query:
conn_data = SiteData.objects.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(
active_link=False).select_related('circuit_type'),
)
)
I don't think this will work. Its debatable whether it should work. Let's refer to what prefetch_related does.
Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.
So what happens here is that two queries are dispatched and two lists are realized. These lists are then partitioned in memory and grouped to the correct parent records.
Count() and annotate() are directives to the DBMS that resolve to SQL
Select Count(id) from conn_data
Because of the way annotate and prefetch_related work I think its unlikely they will play nice together. prefetch_related is just a convenience though. From a practical perspective running two separate ORM queries and assigning them to SiteData records yourself is effectively the same thing. So something like ...
#Gets all Circuits counted and grouped by SiteData
Circuits.objects.values('sitedata_id)'.exclude(active_link=False).select_related('circuit_type').annotate(Count('site_data_id'));
Then you just loop over your SiteData records and assign the counts.
Ok I got what I wanted with this, probably a better way of doing it but it works never the less:
from collections import Counter
import operator
class ConnData(object):
def __init__(self, priority='', c_type='', count=0 ):
self.priority = priority
self.c_type = c_type
self.count = count
def __repr__(self):
return '{} {}'.format(self.__class__.__name__, self.c_type)
# get all the site data
conn_data = SiteData.objects.exclude(Q(site_type__site_type='Data Centre') | Q(site_type__site_type='Factory')) \
.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(active_link=False).select_related('circuit_type'),
)
)
# create a list for the conns
conns = []
# add items to list of dictionaries with all required fields
for conn in conn_data:
try:
conn_type = conn.circuits_set.all()[0].circuit_type.circuit_type
prioritiy = conn.circuits_set.all()[0].circuit_type.monitor_priority
conns.append({'circuit_type' : conn_type, 'priority' : prioritiy})
except:
# create category for down sites
conns.append({'circuit_type' : 'Down', 'priority' : 10})
# crate new list for class data
conn_counts = []
# create counter data
conn_count_data = Counter(((d['circuit_type'], d['priority']) for d in conns))
# loop through counter data and add classes to list
for val, count in conn_count_data.items():
cc = ConnData()
cc.priority = val[1]
cc.c_type = val[0]
cc.count = count
conn_counts.append(cc)
# sort the classes by priority
conn_counts = sorted(conn_counts, key=operator.attrgetter('priority'))

Django, general version of prefetch_related()?

Of course, I don't mean to do what prefetch_related does already.
I'd like to mimic what it does.
What I'd like to do is the following.
I have a list of MyModel instances.
A user can either follows or doesn't follow each instance.
my_models = MyModel.objects.filter(**kwargs)
for my_model in my_models:
my_model.is_following = Follow.objects.filter(user=user, target_id=my_model.id, target_content_type=MY_MODEL_CTYPE)
Here I have n+1 query problem, and I think I can borrow what prefetch_related does here. Description of prefetch_related says, it performs the query for all objects and when the related attribute is required, it gets from the pre-performed queryset.
That's exactly what I'm after, perform query for is_following for all objects that I'm interested in. and use the query instead of N individual query.
One additional aspect is that, I'd like to attach queryset rather than attach the actual value, so that I can defer evaluation until pagination.
If that's too ambiguous statement, I'd like to give the my_models queryset that has is_following information attached, to another function (DRF serializer for instance).
How does prefetch_related accomplish something like above?
A solution where you can get only the is_following bit is possible with a subquery via .extra.
class MyModelQuerySet(models.QuerySet):
def annotate_is_follwing(self, user):
return self.extra(
select = {'is_following': 'EXISTS( \
SELECT `id` FROM `follow` \
WHERE `follow`.`target_id` = `mymodel`.id \
AND `follow`.`user_id` = %s)' % user.id
}
)
class MyModel(models.Model):
objects = MyModelQuerySet.as_manager()
usage:
my_models = MyModel.objects.filter(**kwargs).annotate_is_follwing(request.user)
Now another solution where you can get a whole list of following objects.
Because you have a GFK in the Follow class you need to manually create a reverse relation via GenericRelation. Something like:
class MyModelQuerySet(models.QuerySet):
def with_user_following(self, user):
return self.prefetch_related(
Prefetch(
'following',
queryset=Follow.objects.filter(user=user) \
.select_related('user'),
to_attr='following_user'
)
)
class MyModel(models.Model):
following = GenericRelation(Follow,
content_type_field='target_content_type',
object_id_field='target_id'
related_query_name='mymodels'
)
objects = MyModelQuerySet.as_manager()
def get_first_following_object(self):
if hasattr(self, 'following_user') and len(self.following_user) > 0:
return self.following_user[0]
return None
usage:
my_models = MyModel.objects.filter(**kwargs).with_user_following(request.user)
Now you have access to following_user attribute - a list with all follow objects per mymodel, or you can use a method like get_first_following_object.
Not sure if this is the best approach, and I doubt this is what prefetch_related does because I'm joining here.
I found there's way to select extra columns in your query.
extra_select = """
EXISTS(SELECT * FROM follow_follow
WHERE follow_follow.target_object_id = myapp_mymodel.id AND
follow_follow.target_content_type_id = %s AND
follow_follow.user_id = %s)
"""
qs = self.extra(
select={'is_following': extra_select},
select_params=[CONTENT_TYPE_ID, user.id]
)
So you can do this with join.
prefetch_related way of doing it would be separate queryset and look it up in queryset for the attribute.

Filter on a distinct field with TastyPie

Suppose I have a Person model that has a first name field and a last name field. There will be many people who have the same first name. I want to write a TastyPie resource that allows me to get a list of the unique first names (without duplicates).
Using the Django model directly, you can do this easily by saying something like Person.objects.values("first_name").distinct(). How do I achieve the same thing with TastyPie?
Update
I've adapted the apply_filters method linked below to use the values before making the distinct call.
def apply_filters(self, request, applicable_filters):
qs = self.get_object_list(request).filter(**applicable_filters)
values = request.GET.get('values', '').split(',')
if values:
qs = qs.values(*values)
distinct = request.GET.get('distinct', False) == 'True'
if distinct:
qs = qs.distinct()
return qs
values returns dictionaries instead of model objects, so I don't think you need to override alter_list_data_to_serialize.
Original response
There is a nice solution to the distinct part of the problem here involving a light override of apply_filters.
I'm surprised I'm not seeing a slick way to filter which fields are returned, but you could implement that by overriding alter_list_data_to_serialize and deleting unwanted fields off the objects just before serialization.
def alter_list_data_to_serialize(self, request, data):
data = super(PersonResource, self).alter_list_data_to_serialize(request, data)
fields = request.GET.get('fields', None)
if fields is not None:
fields = fields.split(',')
# Data might be a bundle here. If so, operate on data.objects instead.
data = [
dict((k,v) for k,v in d.items() if k in fields)
for d in data
]
return data
Combine those two to use something like /api/v1/person/?distinct=True&values=first_name to get what you're after. That would work generally and would still work with additional filtering (&last_name=Jones).