Django Many to Many Data Duplication? - django

Background
I'm storing data about researchers. eg, researcher profiles, metrics for each researcher, journals they published in, papers they have, etc.
The Problem
My current database design is this:
Each Researcher has many journals (they published in). The journals have information about it.
Likewise for Subject Areas
But currently, this leads to massive data duplication. Eg, the same journal can appear many times in the Journal table, just linked to a different researcher, etc.
Is there any better way to tackle this problem? Like right now, I have over 5000 rows in the journal column but only about 1000 journals.
Thank you!
EDIT: This is likely due to the way im saving the models for new data (mentioned below). Could anyone provide the proper way to loop and save hashes to models?
Model - Researcher
class Researcher(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
scopus_id = models.BigIntegerField(db_index=True) # Index to make searches quicker
academic_rank = models.CharField(max_length=100)
title = models.CharField(max_length=200,default=None, blank=True, null=True)
salutation = models.CharField(max_length=200,default=None, blank=True, null=True)
scopus_first_name = models.CharField(max_length=100)
scopus_last_name = models.CharField(max_length=100)
affiliation = models.CharField(default=None, blank=True, null=True,max_length = 255)
department = models.CharField(default=None, blank=True, null=True,max_length = 255)
email = models.EmailField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return "{} {}, Scopus ID {}".format(self.scopus_first_name,self.scopus_last_name,self.scopus_id)
Model - Journal
class Journal(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
researchers = models.ManyToManyField(Researcher)
title = models.TextField()
journal_type = models.CharField(max_length=40,default=None,blank=True, null=True)
abbreviation = models.TextField(default=None, blank=True, null=True)
issn = models.CharField(max_length=50, default=None, blank=True, null=True)
journal_rank = models.IntegerField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return self.title
How I'm currently saving them:
db_model_fields = {'abbreviation': 'Front. Artif. Intell. Appl.',
'issn': '09226389',
'journal_type': 'k',
'researchers': <Researcher: x, Scopus ID f>,
'title': 'Frontiers in Artificial Intelligence and Applications'}
# remove researchers or else create will fail (some id need to exist error)
researcher = db_model_fields["researchers"]
del db_model_fields["researchers"]
model_obj = Journal(**db_model_fields)
model_obj.save()
model_obj.researchers.add(researcher)
model_obj.save()

Here is how it works :
class Journal(models.Model):
# some fields
class Researcher(models.Model):
# some fields
journal = models.ManyToManyField(Journal)
Django gonna create a relation table :
Behind the scenes, Django creates an intermediary join table to represent the many-to-many relationship
So you'll have many rows in this table, which is how it works, but journal instance and researcher instance in THEIR table will be unique.
Your error is maybe coming from how you save. Instead of :
model_obj = Journal(**db_model_fields)
model_obj.save()
Try to just do this:
model_obj = Journal.objects.get_or_create(journal_id)
This way you'll get it if it already exists. As none of your fields are unique, you're creating new journal but there's no problem cause django is generating unique ID each time you add a new journal.

Related

Optimizing Django with prefetch and filters in large table

I had a database in php/html using MySQL and am transferring this to a Django project.
I have all the functionalities working, but loading a table of the data I want is immensely slow because of the relations with other tables.
After searching for days I know that I probably have to use a model.Manager to use prefetch_all. However, I am not stuck on how to call this into my template.
I have the following models(simplified):
class OrganisationManager(models.Manager):
def get_queryset_director(self):
person_query = Position.objects.select_related('person').filter(position_current=True,
position_type="director"
)
return super().get_queryset().prefetch_related(Prefetch('position_set', queryset=person_query, to_attr="position_list"))
def get_queryset_president(self):
person_query = Position.objects.select_related('person').filter(position_current=True,
position_type="president"
)
return super().get_queryset().prefetch_related(Prefetch('position_set', queryset=person_query, to_attr="position_list"))
class Person(models.Model):
full_name = models.CharField(max_length=255, blank=True, null=True)
country = models.ForeignKey(Country, models.CASCADE, blank=True, null=True)
birth_date = models.DateField(blank=True, null=True)
class Organisation(models.Model):
organisation_name = models.CharField(max_length=255, blank=True, null=True)
positions = models.ManyToManyField(Person, through='Position')
# positions are dynamic, even though there should only be only one director and president at each given time, a onetoone model wouldn't work in this scenario
objects = OrganisationManager()
# The following defs are currently used to show the names and start dates of the director and president in the detailview and listview
def director(self):
return self.position_set.filter(position_current=True, position_type="director").last()
def president(self):
return self.position_set.filter(position_current=True, position_type="P").last()
class Position(models.Model):
POSITION_TYPES = (
('president','President'),
('director','Director'),
)
person = models.ForeignKey(Person, on_delete=models.CASCADE)
organisation = models.ForeignKey(Organisation, on_delete=models.CASCADE)
position_type = models.CharField(max_length=255, choices=POSITION_TYPES, blank=True, null=True)
position_current = models.BooleanField(default=False)
position_start = models.CharField(max_length=10, blank=True, null=True)
I want my table to look like this:
Organisation Name
President
President Start Date
Director
Director Start Date
Organisation 1
President of org 1
2013
Director of org 1
2015
Organisation 2
President of org 2
2018
Director of org 2
2017
With the code I currently have, it all works great. But because it has to call the database each time, this even causes Heroku to timeout.
I don't understand how to use the prefetch query in the models.Manager in the table (ListView) template. Thanks!
One approach to achieve the results you want is to use subqueries. So something like:
president_subquery = Position.objects.filter(
organisation=OuterRef('pk'), position_type='president', position_current=True
).last()
director_subquery = Position.objects.filter(
organisation=OuterRef('pk'), position_type='director', position_current=True
).last()
Organisation.objects.annotate(
president=Subquery(president_subquery.values('person__fullname')),
president_start_date=Subquery(president_subquery.values('position_start')),
director=Subquery(director_subquery.values('person__fullname')),
director_start_date=Subquery(director_subquery.values('position_start')),
)

How to copy a object data to another object in Django?

I am trying to create an E-Commerce Website and I am at the Final Step i.e. Placing the Order. So, I am trying to add all the Cart Items into my Shipment model. But I am getting this error.
'QuerySet' object has no attribute 'product'
Here are my models
class Product(models.Model):
productId = models.AutoField(primary_key=True)
productName = models.CharField(max_length=200)
productDescription = models.CharField(max_length=500)
productRealPrice = models.IntegerField()
productDiscountedPrice = models.IntegerField()
productImage = models.ImageField()
productInformation = RichTextField()
productTotalQty = models.IntegerField()
alias = models.CharField(max_length=200)
url = models.CharField(max_length=200, blank=True, null=True)
class Customer(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
name = models.CharField(max_length=100, null=True, blank=True)
email = models.EmailField(max_length=100)
profileImage = models.ImageField(blank=True, null=True, default='profile.png')
phoneNumber = models.CharField(max_length=10, blank=True, null=True)
address = models.CharField(max_length=500, blank=True, null=True)
class Order(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.SET_NULL, blank=True, null=True)
dateOrdered = models.DateTimeField(auto_now_add=True)
orderCompleted = models.BooleanField(default=False)
transactionId = models.AutoField(primary_key=True)
class Cart(models.Model):
product = models.ForeignKey(Product, on_delete=models.SET_NULL, blank=True, null=True)
order = models.ForeignKey(Order, on_delete=models.SET_NULL, blank=True, null=True)
quantity = models.IntegerField(default=0, blank=True, null=True)
dateAdded = models.DateTimeField(auto_now_add=True)
class Shipment(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.SET_NULL, blank=True, null=True)
orderId = models.CharField(max_length=100)
products = models.ManyToManyField(Product)
orderDate = models.CharField(max_length=100)
address = models.CharField(max_length=200)
phoneNumber = models.CharField(max_length=13)
I just removed additional functions i.e. __str__ and others.
Here is the views.py
def orderSuccessful(request):
number = Customer.objects.filter(user=request.user).values('phoneNumber')
fullAddress = Customer.objects.filter(user=request.user).values('address')
timeIn = time.time() * 1000 # convert current time in milliSecond
if request.method == 'POST':
order = Shipment.objects.create(customer=request.user.customer, orderId=timeIn,
orderDate=datetime.datetime.now(), address=fullAddress,
phoneNumber=number)
user = Customer.objects.get(user=request.user)
preOrder = Order.objects.filter(customer=user)
orders = Order.objects.get(customer=request.user.customer, orderCompleted=False)
items = orders.cart_set.all() # Here is all the items of cart
for product in items:
product = Product.objects.filter(productId=items.product.productId) # error is on this line
order.products.add(product)
Cart.objects.filter(order=preOrder).delete()
preOrder.delete()
order.save()
else:
return HttpResponse("Problem in Placing the Order")
context = {
'shipment': Shipment.objects.get(customer=request.user.customer)
}
return render(request, "Amazon/order_success.html", context)
How to resolve this error and all the cart items to field products in Shipment model?
Your model is not really consistent at all. Your Cart object is an m:n (or m2m - ManyToMany) relationship between Product and Order. Usually, you would have a 1:n between Cart and Product (a cart contains one or more products). One Cart might be one Order (unless you would allow more than one carts per order). And a shipment is usually a 1:1 for an order. I do not see any of this relationships in your model.
Draw your model down and illustrate the relations between them first - asking yourself, if it should be a 1:1, 1:n or m:n? The latter can be realized with a "through" model which is necessary if you need attributes like quantities.
In this excample, we have one or more customers placing an order filling a cart with several products in different quantities. The order will also need a shipment fee.
By the way: bear in mind that "filter()" returns a list. If you are filtering on user, which is a one to one to a unique User instance, you would better use "get()" as it returns a single instance.
Putting in into a try - except or using get_object_or_404() makes it more stable.
product = Product.objects.filter(productId=items.product.productId)
should be something like:
product = product.product
not to say, it becomes obsolete.
It looks like you make a cart for a product by multiple instances of Cart, the problem is you try to access the wrong variable, also you don't need to filter again when you already have the instance, make the following changes:
carts = orders.cart_set.all() # Renamed items to carts for clarity
for cart in carts:
product = cart.product
order.products.add(product) # The name order is very misleading makes one think it is an instance of Order, actually it is an instance of Shipment
As mentioned above in my comment your variable names are somewhat misleading, please give names that make sense to any variable.

Django Forms multiple foreignkey

I have four models, three of which have ‘independent’ fields but the fourth models has ForeignKey links to the other three.
class PreCheck(models.Model):
name = models.CharField(max_length=120)
time_in = models.DateTimeField(auto_now_add=True)
is_insured = models.BooleanField()
class MainCheck(models.Model):
height = models.FloatField()
weight = models.IntegerField()
class PostCheck(models.Model):
sickness = models.CharField(max_length=30)
medication = models.CharField(max_length=30)
class MedicalRecord(models.Model):
patient = models.ForeignKey(User)
next_check_date = models.DateTimeField()
payment_amount = models.IntegerField()
initial_check = models.ForeignKey(PreCheck)
main_check = models.ForeignKey(MainCheck)
post_check = models.ForeignKey(PostCheck)
Assume a patient goes in a room, a precheck is done and saved, then other checks are done and finally a final record is set.
Ideally, I would like to fill in forms for the different models at different times possibly in different pages/tabs.
The admin has popups for the MedicalRecord model but in the frontend its hard to write javascript for that.
Another option would be to fill in the modelforms separately and do a str return function then select that from dropdowns in the MedicalRecord form( which I’m trying to avoid)
Just add blank=True, null=True for each ForeignKey fields.
initial_check = models.ForeignKey(PreCheck, blank=True, null=True)
main_check = models.ForeignKey(MainCheck, blank=True, null=True)
post_check = models.ForeignKey(PostCheck, blank=True, null=True)
at the initial check, you can create MedicalRecord with help of MedicalRecord model-form, this time main_check and post_check record can be left blank.
after main check, you can update MedicalRecord with main_check details, this time left blank post_check record, and keep updating your MedicalRecord on different pages/tabs with available details.

Django: Converting a GenericForeignKey Relationship to a ForeignKey Relationship

I'm working to remove an existing GenericForeignKey relationship from some models. Id like to change it to the Reformatted Model below. Does migrations provide a way to convert the existing content_type and object_ids to the respective new ForeignKey's? (to keep existing data). Basically brand new at programming, so pardon me if I'm asking a stupid question.
class Donation(models.Model):
amount_id = models.CharField(max_length=12, unique=True, editable=False)
date_issued=models.DateField(auto_now_add=True)
description=models.TextField(blank=True, null=True)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type','object_id')
class Individual(BaseModel):
first_name = models.CharField(max_length=50)
middle_name = models.CharField(max_length=50, blank=True,
null=True)
last_name = models.CharField(max_length=50)
suffix = models.CharField(max_length=50, blank=True, null=True)
contributor = generic.GenericRelation(Donation, related_query_name='individual')
class Organization(models.Model):
name = models.CharField(max_length=100)
contributor = generic.GenericRelation(Donation, related_query_name='organization')
Reformatted Model
class Donation(models.Model):
amount_id = models.CharField(max_length=12, unique=True, editable=False)
date_issued=models.DateField(auto_now_add=True)
description=models.TextField(blank=True, null=True)
contributor_group = models.ForeignKey(Organization, null=True, blank=True, on_delete=models.CASCADE)
contributor_individual = models.ForeignKey(Individual, null=True, blank=True, on_delete=models
Based on your model definition of Donation Model, one of fields contributor_group , contributor_model will always be Null post migration.
I hope you have taken that into you consideration.
Just to be safe Do this in two phases.
1. Keep the content_type and object_id and add the two new fields.
2. Next step remove the generic fields post data population.
There are two ways to populate those new fields
Django migrations provides you a way to populate new fields with values during the migrations. You can look it up. I haven't done that before either.
For more control and some learning as well. You can populate that via scripting as well. You can setup django-extensions module in your project. And write a script to do that population for you as well. A sample script would look like.
from myproject.models import Donation, Individual, Organization
from django.contrib.contenttypes.models import ContentType
def run():
organization_content_type = ContentType.objects.get_for_model(Organization)
individual_content_type = ContentType.obejcts.get_for_model(Individual)
donations = Donation.objects.all()
for donation in donations:
if donation.content_type_id == organization_content_type.id:
donation.contributor_group = donation.object_id
elif donation.content_type_id == individual_content_type.id:
donation.contributor_individual = donation.object_id
else:
print "Can't identify content type for donation id {}".format(donation.id)
donation.save()
Check the values are correct and then remove the generic fields.
Facing some issues with formatting here.

Django: Get distinct values from a foreign key model

Django newbie, so if this is super straightfoward I apologize.
I am attempting to get a listing of distinct "Name" values from a listing of "Activity"s for a given "Person".
Models setup as below
class Activity(models.Model):
Visit = models.ForeignKey(Visit)
Person = models.ForeignKey(Person)
Provider = models.ForeignKey(Provider)
ActivityType = models.ForeignKey(ActivityType)
Time_Spent = models.IntegerField(blank=True, null=True)
Repetitions = models.CharField(max_length=20, blank=True, null=True)
Weight_Resistance = models.CharField(max_length=50, blank=True, null=True)
Notes = models.CharField(max_length=500, blank=True, null=True)
class ActivityType(models.Model):
Name = models.CharField(max_length=100)
Activity_Category = models.CharField(max_length=40, choices=Activity_Category_Choices)
Location_Category = models.CharField(max_length=30, blank=True, null=True, choices=Location_Category_Choices)
I can get a listing of all activities done with a given Person
person = Person.objects.get(id=person_id)
activity_list = person.activity_set.all()
I get a list of all activities for that person, no problem.
What I can't sort out is how to generate a list of distinct/unique Activity_Types found in person.activity_set.all()
person.activity_set.values('ActivityType').distinct()
only returns a dictionary with
{'ActivityType':<activitytype.id>}
I can't sort out how to get straight to the name attribute on ActivityType
This is pretty straightforward in plain ol' SQL, so I know my lack of groking the ORM is to blame.
Thanks.
Update: I have this working, sort of, but this CAN'T be the right way(tm) to do this..
distinct_activities = person.activity_set.values('ActivityType').distinct()
uniquelist = []
for x in distinct_activities:
valuetofind = x['ActivityType']
activitytype = ActivityType.objects.get(id=valuetofind)
name = activitytype.Name
uniquelist.append((valuetofind, name))
And then iterate over that uniquelist...
This has to be wrong...
unique_names = ActivityType.objects.filter(
id__in=Activity.objects.filter(person=your_person).values_list('ActivityType__id', flat=True).distinct().values_list('Name', flat=True).distinct()
This should do the trick. There will be not a lot of db hits also.
Writing that down from my phone, so care for typos.