Using a Textfield with JSON instead of a ForeignKey relationship?

Using a Textfield with JSON instead of a ForeignKey relationship? - django

I am working on a project where users can roll dice pools. A dice pool is, for example, a throw with 3 red dice, 2 blue and 1 green. A pool is composed of several dice rolls and modifiers.
I have 3 models connected this way:
class DicePool(models.Model):
# some relevant fields
class DiceRoll(models.Model):
pool = models.ForeignKey(DicePool, on_delete=models.CASCADE)
# plus a few more information fields with the type of die used, result, etc
class Modifier(models.Model):
pool = models.ForeignKey(DicePool, on_delete=models.CASCADE)
# plus about 4 more information fields
Now, when I load the DicePool history, I need to prefetch both the DiceRoll and Modifier.
I am now considering replacing the model Modifier with a textfield containing some JSON in DicePool. Just to reduce the number of database queries.
Is it common to use a json textfield instead of a database relationship?
Or am I thinking this wrong and it's completely normal to do additional queries to prefetch_related everytime I load my pools?
I personally find using a ForeignKey cleaner and it would let me do db-wise changes to data if needed. But my code is making too many db queries and I am trying to see where I can improve it.
FYI: I am using MySQL

Is it common to use a JSON text field instead of a database relationship?
I don't think so. Also, I don't believe it's advisable because (especially using MySQL that doesn't support things like JSONField) you'll end up with a text that you'd then need to parse somehow to a dict and then look up the things you want.
Personally (and I would assume that most people) would stick to FK relationships. Also, by doing prefetch_related or select_related you're already avoiding unnecessary queries.

Related

Should I use JSONField over ForeignKey to store data?

I'm facing a dilemma, I'm creating a new product and I would not like to mess up the way I organise the informations in my database.
I have these two choices for my models, the first one would be to use foreign keys to link my them together.
Class Page(models.Model):
data = JsonField()
Class Image(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
Class Video(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
etc...
The second is to keep everything in Page's JSONField:
Class Page(models.Model):
data = JsonField() # videos and pictures, etc... are stored here
Is one better than the other and why? This would be a huge help on the way I would organize my databases in the futur.
I thought maybe the second option could be slower since everytime something changes all the json would be overridden, but does it make a huge difference or is what I am saying false?

A JSONField obfuscates the underlying data, making it difficult to write readable code and fully use Django's built-in ORM, validations and other niceties (ModelForms for example). While it gives flexibility to save anything you want to the db (e.g. no need to migrate the db when adding new fields), it takes away the clarity of explicit fields and makes it easy to introduce errors later on.
For example, if you start saving a new key in your data and then try to access that key in your code, older objects won't have it and you might find your app crashing depending on which object you're accessing. That can't happen if you use a separate field.
I would always try to avoid it unless there's no other way.
Typically I use a JSONField in two cases:
To save a response from 3rd party APIs (e.g. as an audit trail)
To save references to archived objects (e.g. when the live products in my db change but I still have orders referencing the product).
If you use PostgreSQL, as a relational database, it's optimised to be super-performant on JOINs so using ForeignKeys is actually a good thing. Use select_related and prefetch_related in your code to optimise the number of queries made, but the queries themselves will scale well even for millions of entries.

Performance: Store likes in PostgreSQL ArrayField (Django example)

I have 2 models: Post and Comment, each can be liked by User.
For sure, total likes should be rendered somewhere near each Post or Comment
But also each User should have a page with all liked content.
So, the most obvious way is just do it with m2m field, which seems will lead to lots of problems in some future.
And what about this?
Post and Comment models should have some
users_liked_ids = ArrayField(models.IntegerField())
User model should also have such fields:
posts_liked_ids = ArrayField(models.IntegerField())
comments_liked_ids = ArrayField(models.IntegerField())
And each time User likes something, two actions are performed:
User's id adds to Post's/Comment's users_liked_ids field
Post's/Comment's id adds to User's posts_liked_ids/comments_liked_ids field
The questions are:
Is it a good plan?
Will it be efficient to make lookups in such approach to get "Is that Post/Comment` was liked but current user
Will it be better to store likes in some separate table, rather then in liked model, but also in ArrayField
Probably better stay with obvious m2m?

1) No.
2) Definitely not.
3) Absolutely, incredibly not. Don't split your data up even further.
4) Yes.
Here are some of the problems:
no referential integrity, since you can't create foreign keys on array elements, meaning you could easily have garbage values in an ID array
data duplication with posts having user ids and users having post ids means it's possible for information to get out of sync (what happens when a user or post is deleted?)
inefficient lookups in match arrays (your #2)
Don't, under any circumstances, do this. You may want to combine your "post" and "comment" models to simplify the relationship, but this is what junction tables are for. Arrays are good for use cases that don't involve foreign keys or the potential for extreme length.

Django - How to add an entry in Ascending order in the database efficiently?

I am working on making an application in Django that can manage my GRE words and other details of each word. So whenever I add a new word to it that I have learnt, it should insert the word and its details in the database alphabetically. Also while retrieving, I want the details of the particular word I want to be extracted from the database.
Efficiency is the main issue.
Should I use SQLite? Should I use a file? Should I use a JSON object to store the data?
If using a file is the most efficient, what data structure should I implement?
Are there any functions in Django to efficiently do this?
Each word will have - meaning, sentence, picture, roots. How should I store all this information?
It's fine if the answer is not Django specific and talks about the algorithm or the type of database.

I'm going to answer from the data perspective since this is not totally related to django.
From your question it appears you have a fixed identifier for each "row": the word, which is a string, and a fixed set of attributes.
I would recommend using any enterprise level RDBMS. In the case of django, the most popular for the python ecosystem is PostgreSQL.
As for the ordering, just create the table with an index on the word name (this will be automatically done for you if you use the word as primary key), and retrieve your records using order_by in django.
Here's some info on django field options (check primary_key=True)
And here's the info for the order_by order_by method
Keep in mind you can also set the ordering in the Meta class of the model.
For your search case, you'll have to implement an endpoint that is capable of querying your database with startswith. You can check an example here
Example model:
class Word(models.Model):
word = models.CharField(max_length=255, primary_key=True)
roots = ...
picture = ...
On your second question: "Is this costly?"
It really depends. With 4000 words I'll say: NO
You probably want to add a delay in the client to do the query anyways (for example "if the user has typed in and 500ms have passed w/o further input")
If I'm to give 1 good advice to any starting developer, it's don't optimize prematurely

Django model for sparse data

I am developing a django app that contains a number of forms which will be used to enter clinical data on some cancer tissue samples (10-20 fields per form, mostly CharField, FloatField and some multiple choice text dropdowns).
My challenge is that I need a form that can display different fields based on a diagnosis, for 150+ diagnoses. I can programmatically read the list of diagnoses, the fields required for each diagnosis and corresponding field types. Also, the set of all unique fields across all diagnoses is large (much larger than the number of fields needed for any specific diagnosis).
e.g.
disease_specific_fields field_type
diagnosis
B-lymphoblastic leukemia/lymphoma NOS EBV-positive Pull down: Yes/No
B-lymphoblastic leukemia/lymphoma with recurrent genetic abnormalities(TCF3-PBX1) EBV-positive Pull down: Yes/No
Monoclonal B lymphocytosis(CLL/SLL spectrum) EBV-positive Pull down: Yes/No
Peripheral T cell lymphoma NOS EBV-positive Pull down: Yes/No
AML with recurrent cytogenetic abnormalities(t(6;9) DEK-NUP214) EBV-positive Pull down: Yes/No
So far, I thought of the following approaches:
Create a single huge model that will contain mostly sparse data, and handle irrelevant data using django forms. CONS: inefficient storage and a lot of overhead code tied to forms.
Create a model for each diagnosis. CONS: complicates migrations and maintenance, I think.
Create one small model for all diagnoses that contains several 'generic' fields of each type ('CharField', 'FloatField', etc), and render respective field names dynamically in forms / views.
I am looking for any constructive suggestions on how to implement a model/models capturing the above data. Efficiency and storage are secondary concerns, mostly I want a clean and intuitive solution. Any answers tailored for django will be especially helpful.

A few options I'd consider-
Use Django-Polymorphic to create inheritance-based model types
Django-Polymorphic allows you to use inheritance for differentiating between types of models.
from polymorphic.models import PolymorphicModel
class Animal(PolymorphicModel):
kingdom = models.CharField(default="Animalia")
class Lizard(Animal):
class = models.CharField(default="Reptilia")
class Iguana(Lizard):
favorite_tree = models.Charfield()
While polymorphic uses a single db table for any model in an inheritance scheme, types are stored. As such, if you know the specific fields you want to capture hard-code it. Plus, you can filter by level (So, you could run a query on all Animal instances or all Iguana instances in the example above). There's no relations created by a polymorphic model, so performance is extremely good.
Use Django-Mutant if dynamic field creation is needed
Django-Mutant allows for dynamic creation of fields per model, allowing you top define data as needed on the fly. However, intermediary tables are required to do this. You gain a lot of flexibility while losing performance.
Use the postgres-specific JsonField to store data
Django 1.9 introduced native support for field type JsonField, allowing you to write Json structures to a db field as well as query them relatively quickly. You get amazing flexibility with decent performance but may struggle in providing user friendly forms to create, update, and verify the data. However, it has been done in many projects and there are libraries out there to assist with it.
from django.contrib.postgres.fields import JSONField
from django.db import models
class SomeModel(models.Model):
attributes = JsonField()
>>> some_attributes = {'color':'red', 'cell_count':150, 'enzymes':['xyzyss','xyxzxxyx']}
>>> a = SomeModel.objects.create(attributes=some_attributes)
>>> SomeModel.objects.filter(attributes__color='red')
(<<< will return a queryset with instance 'a' in it >>>)

Filter on a list of tags

I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel

You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries

Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)

You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js