How to implement JsonField in django that has postgresql as backend? - django

I want to implement JsonField in my django application running postgresql.Can I also have indexing on that Json Field so that I can have Mongo like features? Do I have to make use of sqlalchemy for this or django built-in ORM is suitable for this purpose?
Thanks.

You can easily install django-jsonfield
pip install jsonfield
and use it on your field
class MyModel(models.Model):
my_json_field = JSONField()
It's just a TextField that serializes the json object to a python dictionary so no you can't have an index on it nor can you make queries against your json field.

If you are using Postgres (and don't care about compatibility with other DB engines) you should consider django's postgres fields
https://docs.djangoproject.com/en/1.9/ref/contrib/postgres/fields/
this should have much better performance than the ordinary jsonfield.
If DB compatibility is an issue and/or you want your field be readable/editable through django admin you might want to consider KeyValueField instead https://github.com/rewardz/django_model_helpers
Data is stored in DB like this
Name = abc
Age = 123
but returned to you like this
{"Name": "abc", "Age": "123"}
So if you make db_index = True you can do field__contains="Age = 123" but despite using db_index, its not fool proof because Age=1234 will also be returned by that query plus indexing text field is not usually recommended

Related

Do data conversion after Django model object is fetched

I want to save the python dictionary inside the Django model as JSON and I want to convert that JSON back into a python dictionary when that data is fetched.
I know I can do it inside view but I want to implement it in the model so it can return dictionary object when queried.
is there any signal or any post_fetch method that I can use to achieve it, I couldn't find anything googling it...
You might want to look into simply using the Python JSON package, unless you are using Postgres as your database - in which case JSONField is the way to go. The json package is explained here and you could use it in a model like so if I understand what you are saying:
import json
class MyModel(models.Model):
json_field = models.TextField() # or you can use JSONField if using Postgres
#property
def get_json_field_as_dictionary(self):
return json.loads(self.json_field)
def set_json_field(self, mydict):
self.json_field = json.dumps(mydict)
#classmethod
def get_json_from_dictionary(cls, mydict):
return json.dumps(mydict)
When you are saving to the database, you can use json.dumps(myDictionary) or convert the dictionary to JSON by calling MyModelObject.set_json_field(myDictionary) to convert a Python dictionary to JSON and then store it in the json_field of the model. To retrieve the JSON data as a dictionary, you simply call MyModel.objects.last().get_json_field_as_dictionary (or whatever you prefer to call it, json_dictionary perhaps so it would be MyModel.objects.last().json_dictionary) and it will return the value of that property as if it were an element in the model without having to do the conversion each time.
Or, if you are using Postgres as your backend, this is a lot easier with JSONField:
class MyModel(models.Model):
json_field = models.JSONField(null=True)
And to save:
myObject = myModel.objects.create(json_field=myDictionary)
If you clarify I can update my answer to explain better.
You may want to use JSONField. It allows storing data encoded as JSON and retrieve them as the corresponding Python data type, including dictionary.
JSONField only works on PostgreSQL with Django < 3.1, but works on any database with Django >= 3.1.

When using natural_keys in Django, how can I distinguish between creates & updates?

In Django, I often copy model fixtures from one database to another. My models use natural_keys (though I'm not sure that's relevant) during serialization. How can I ensure the instances that have been updated in one database are not inserted into the other database?
Consider the following code:
models.py:
class AuthorManager(models.Manager):
def get_by_natural_key(self, name):
return self.get(name=name)
class Author(models.Model):
objects = AuthorManager()
name = models.CharField(max_length=100)
def natural_key(self):
return (self.name,)
Now if I create an author called "William Shakespeare" and dump it to a fixture via python manage.py dumpdata --natural_keys I will wind up w/ the following sort of file:
[
{
"model": "myapp.author",
"fields": {
"name": "Wiliam Shakespeare"
}
}
]
I can load that into another db and it will create a new Author named "William Shakespeare".
But, if I rename that author to "Bill Shakespeare" in the original database and recreate the fixture and load it into the other database then it will create another new Author named "Bill Shakespeare" instead of update the existing Author's name.
Any ideas on how to approach this?
You're using fixtures for what it's not made for: synchronizing databases. It's made for populating empty databases. In particular, "deletion" cannot be expressed in fixtures. An update based on natural keys could be expressed as an insertion + a deletion.
Now you can work around this by simply not using natural keys, but then the primary keys must be identical between databases. If the target database receives inserts from another source, then this is a problem as updates will be occur at the wrong object.
In short: use synchronization/replication tools to synchronize databases, use fixtures for migrations and tests. Trying to use fixtures for synchronization is error prone.

Should I use ArrayField or ManyToManyField for tags

I am trying to add tags to a model for a postgres db in django and I found two solutions:
using foreign keys:
class Post(models.Model):
tags = models.ManyToManyField('tags')
...
class Tag(models.Model):
name = models.CharField(max_length=140)
using array field:
from django.contrib.postgres.fields import ArrayField
class Post(models.Model):
tags = ArrayField(models.CharField(max_length=140))
...
assuming that I don't care about supporting other database-backends in my code, what is a recommended solution ?
If you use an Array field,
The size of each row in your DB is going to be a bit large thus Postgres is going to be using more toast tables
Every time you get the row, unless you specifically use defer the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M field,
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded.
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields.
The cost of the joins in this case is completely negligible because of keys.
With that being said, the above answer doesn't belong to me. A while ago, I had stumbled upon this dilemma when I was learning Django. I had found the answer here in this question, Django Postgres ArrayField vs One-to-Many relationship.
Hope you get what you were looking for.
If you want the class tags to be monitored ( For eg : how many tags, how many of a particular tag etd ) , the go for the first option as you can add more fields to the model and will add richness to the app.
On the other hand, if you just want it to be a array list just for sake of displaying or minimal processing, go for that option.
But if you wish to save time and add richness to the app, you can use this
https://github.com/alex/django-taggit
It is as simple as this to initialise :
from django.db import models
from taggit.managers import TaggableManager
class Food(models.Model):
# ... fields here
tags = TaggableManager()
and can be used in the following way :
>>> apple = Food.objects.create(name="apple")
>>> apple.tags.add("red", "green", "delicious")
>>> apple.tags.all()
[<Tag: red>, <Tag: green>, <Tag: delicious>]

How to improve 2 million data query speed in Django RESTful APIs

I have a scientific research publications data of 2 Million records. I used django restframework to write apis for searching the data in title and abstract. This is taking me 12 seconds while using postgres as db, but if I used MongoDB as db, it goes down to 6seconds.
But even 6 seconds sounds a lot of waiting for user to me. I indexed the title and abstract, but abstract indexing failed because some of the abstract texts are too lengthy.
Here is the django Model using MongoDB(MongoEngine as ODM):
class Journal(Document):
title = StringField()
journal_title = StringField()
abstract = StringField()
full_text = StringField()
pub_year = IntField()
pub_date = DateTimeField()
pmid = IntField()
link = StringField()
How do I improve the query performance, what stack makes the search and retrieval more faster?.
Some pointers about optimisation for the Django ORM with Postgres:
Use db_index=True on fields that will be search upon often and have some degree of repetition between entries, like "title".
Use values() and values_list() to select only the columns you want from a QuerySet.
If you're doing full text search in any of those columns (like a contains query), bear in mind that Django has support for full text search directly on a Postgres database.
Use print queryset.query to check what kind of SQL query is going into your database and if it can be improved upon.
Many Postgres optimisation techniques rely in custom SQL queries that can be made in Django by using RawSQL expressions.
Remember that there are many, many ways to search for data in a database, be it relational or not-relational in nature. In your case, MongoDB is not "faster" than Postgres, it's just doing a better job at querying what you really want.

Django ListView with subquery gives This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

I have the following model:
class Credit(models.Model):
user = models.ForeignKey(User)
...
and a ListView
class CreditListView(ListView):
paginate_by = 10
model = Credit
...
if I want to filter the credits by users in side CreditListView:
def get_queryset(self):
users = User.objects.filter(...)[:10]
credits = Credits.objects.filter(user__in=users)
return credits
I will get a NotSupportedError exception:
(1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
Try this :
users = User.objects.filter(...)[:10]
users = list(users)
credits = Credits.objects.filter(user__in=users)
Django model returns a list which exactly isn't a Python list but an extended form of that. But that has some limitations. Hence, we need to convert that into a Python understandable list.
The problem was this line:
users = User.objects.filter(...)[:10]
It doesn't like the limit within the subquery, I thought I have tried removing it, could be django server didn't restart properly.
(1235, "This version of MySQL doesn't yet support 'LIMIT &
IN/ALL/ANY/SOME subquery'")
Pagination is responsible for the LIMIT and your queryset for the IN. You might be able to get away by rewriting the query with include/exclude or by using index ranges instead of pagination. If not, use raw SQL to get your queryset.
In general using MySQL (with MyISAM in particular) is a very painful idea, because apart from its problems as a database, it does not support 20-25% of Django's ORM. If you can not switch to PostgeSQL, try using InnoDB.