Deciding how to model this data in django - django

Imagine it is for translating vocabulary to another language. I'm only dealing with a limited number of words (~2000).
Language1 and Language2 (which will have a different ~2000 words), each might have multiple words equivalents from the other language which may or may not be on the list of ~2000 words of the other language.
Using a many-to-many relationship initially appealed to me, but I can't quite see through the mist to see what would work best.
My other thought was just making a json dump for each word. Something like....
{1: {'Lang1': [word1, word2], 'Lang2': [word1, word2]}}
but I am not sure if that is too smart to manage everything like that, it would be cumbersome to do from the admin section (because I think I would be editing a long line of text that is a json object) and it doesn't take advantage of much.
Maybe there is another way that I havent thought of?
Given my scenario, how would you go about defining this?

class Language(models.Model):
name = models.CharField(max_length=100)
class Word(models.Model):
name = models.CharField(max_length=100)
language = models.ForeignKey(Language, related_name='language_words')
#...
class Translation(models.Model):
word = models.ForeignKey(Word, related_name='word_translations')
translation = models.CharField(max_length=100)
from_language = models.ForeignKey(Language, related_name='language_translations')
in_language = models.CharField(max_length=100)
# stage performances
english_language = Language(name='english')
english_language.save()
word = english_language.language_words.create(name='Flower')
german_translation = word.word_translations.create(translation='Blumen',
from_language=english_language,
in_language='German')
word # 'Flower'
german_translation # 'Blumen'
might not be optimal yet, i am in the train right now, but this can be a good way to start hopefully.
then if you register these models into admin, you can easily manage (add/delete) translations..

Related

Django efficient lookups: Related manager vs whole queryset

Let's say I have the following Django models:
class X(models.Model):
some_field = models.FloatField()
class Y(models.Model):
x = models.ForeignKey(X)
another_field = models.DateField()
Let's say I'm looking for a particular instance of y, with a certain date (lookup_date), belonging to a certain x. Which option would be a more efficient lookup, if any?:
1. Y.objects.get(x=x, another_field=lookup_date)
or using the related manager:
2. x.y_set.get(another_field=lookup_date)
You'll probably find that they produce the same query, you can check this by adding .query to the end of the query which will show the resulting sql.
Y.objects.get(x=x, another_field=lookup_date).query
x.y_set.get(another_field=lookup_date).query
But either way this is a micro optimization and you may find it interesting to read Eric Lippert's performance rant.
Is one of them considered more pythonic?
Not really, I tend to use the second since it can make it slightly easier to conform to pep8's line length standard

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Database methods to get a list of non duplicated instances?

In my models I have a class like the following:
class Contact(models.Model):
group = models.CharField(max_length=200, blank=True)
name = models.CharField(max_length=100)
I'd like to find the better way of getting a list of all the groups. So far
I have two solutions:
groups=[]
for contact in Contact.objects.all():
if not contact.group in groups:
groups.append(contact.group)
and the second one:
groups=set(contact.group for contact in Contact.objects.all())
I think that the second one is much better because it uses generators, but I'd like to know if there is some database method like filter, exclude , etc that could allow me to do this.
The point of doing this is to optimize when an user has a lot of contacts but just a few groups. (In that case maybe making a class group would be better, but I'd really like to avoid that)
Best way is to use distinct
groups = Contact.objects.values_list('group', flat=True).distinct()

Django: efficient semi-random order_by for user-friendly results?

I have a Django search app with a Postgres back-end that is populated with cars. My scripts load on a brand-by-brand basis: let's say a typical mix is 50 Chryslers, 100 Chevys, and 1500 Fords, loaded in that order.
The default ordering is by creation date:
class Car(models.Model):
name = models.CharField(max_length=500)
brand = models.ForeignKey(Brand, null=True, blank=True)
transmission = models.CharField(max_length=50, choices=TRANSMISSIONS)
created = models.DateField(auto_now_add=True)
class Meta:
ordering = ['-created']
My problem is this: typically when the user does a search, say for a red automatic, and let's say that returns 10% of all cars:
results = Cars.objects.filter(transmission="automatic", color="red")
the user typically gets hundreds of Fords first before any other brand (because the results are ordered by date_added) which is not a good experience.
I'd like to make sure the brands are as evenly distributed as possible among the early results, without big "runs" of one brand. Any clever suggestions for how to achieve this?
The only idea I have is to use the ? operator with order_by:
results = Cars.objects.filter(transmission="automatic", color="red").order_by("?")
This isn't ideal. It's expensive. And it doesn't guarantee a good mix of results, if some brands are much more common than others - so here where Chrysler and Chevy are in the minority, the user is still likely to see lots of Fords first. Ideally I'd show all the Chryslers and Chevys in the first 50 results, nicely mixed in with Fords.
Any ideas on how to achieve a user-friendly ordering? I'm stuck.
What I ended up doing was adding a priority field on the model, and assigning a semi-random integer to it.
By semi-random, I mean random within a particular range: so 1-100 for Chevy and Chrysler, and 1-500 for Ford.
Then I used that field to order_by.

Foreign Key Relationships

I have two models
class Subject(models.Model):
name = models.CharField(max_length=100,choices=COURSE_CHOICES)
created = models.DateTimeField('created', auto_now_add=True)
modified = models.DateTimeField('modified', auto_now=True)
syllabus = models.FileField(upload_to='syllabus')
def __unicode__(self):
return self.name
and
class Pastquestion(models.Model):
subject=models.ForeignKey(Subject)
year =models.PositiveIntegerField()
questions = models.FileField(upload_to='pastquestions')
def __unicode__(self):
return str(self.year)
Each Subject can have one or more past questions but a past question can have only one subject. I want to get a subject, and get its related past questions of a particular year. I was thinking of fetching a subject and getting its related past question.
Currently am implementing my code such that I rather get the past question whose subject and year correspond to any specified subject like
this_subject=Subject.objects.get(name=the_subject)
thepastQ=Pastquestion.objects.get(year=2000,subject=this_subject)
I was thinking there is a better way to do this. Or is this already a better way? Please Do tell ?
I think what you want is the related_name property of the ForeignKey field. This creates a link back to the Subject object and provides a manager you can use to query the set.
So to use this functionality, change the foreignkey line to:
subject=models.ForeignKey(Subject, related_name='questions')
Then with an instance of Subject we'll call subj, you can:
subj.questions.filter(year=2000)
I don't think this performs much differently to the technique you have used. Roughly speaking, SQL performance boils down a) whether there's an index and b) how many queries you're issuing. So you need to think about both. One way to find out what SQL your model usage is generating is to use SqlLogMiddleware - and alternatively play with the options in How to show the SQL Django is running It can be tempting when you get going to start issuing queries across relationships - e.g. q = Question.objects.get(year=2000, subject__name=SUBJ_MATHS) but unless you keep a close eye on these types of queries, you can and will kill your app's performance, badly.
Django's query syntax allows you to 'reach into' related objects.
past_questions = Pastquestion.objects.filter(year=2000, subject__name=subject_name)