I know I can fetch all authors of a paper like:
paper.authors.all()
This works fine, but just returns me a QuerySet of Authors.
But I want the ManyToMany Object like (because I want to sort after the ID's)
(id (BigAutoField), paper, author)
Is there a faster way to do it then:
Paper.authors.through.objects.all().filter(paper=paper)
Because my Database is really Large ~200 million entries, the command above is not feasible
My Model looks like:
class Paper(models.Model, ILiterature):
authors = models.ManyToManyField(Author, blank=True)
(...)
You can try to select in bulk,
papers = Paper.authors.through.in_bulk(ids)
Django bulk commands are faster and designed for massive DB's like yours. You can check https://levelup.gitconnected.com/optimizing-django-queries-28e96ad204de here for details.
Related
Let's use these 4 simple models for example. A city can have multiple shops, and a shop can have multiple products, and a product can have multiple images.
models.py
class City(models.Model):
name=models.CharField(max_length=300)
class Shop(models.Model):
name = models.CharField(max_length=300)
city = models.ForeignKey(City, related_name='related_city', on_delete=models.CASCADE)
class Product(models.Model):
name=models.CharField(max_length=300)
description=models.CharField(max_length=5000)
shop=models.ForeignKey(Shop, related_name='related_shop', on_delete=models.CASCADE)
class Image(models.Model):
image=models.ImageField(null=True)
product=models.ForeignKey(Product, related_name='related_product', on_delete=models.CASCADE)
For an eCommerce website, users will be writing keywords and I filter on the products names, to get the matching results. I also want to fetch together the related data of shops, cities and images, relevant to the products I will be showing.
To achieve that I am using .select_related() to retrieve the other objects from the foreignKeys as well.
Now, my question is, what is the best way to send that to the client?
One way is to make a single serializer, that groups all data from all 4 tables in a single JSON. That JSON will look like 1NF, since it will have many repetitions, for example, there will be new row for every image, and every shop that the product can be found, and the 10.000 character long description will be repeated for each row, so this is not such a good idea. More specifically the fields are: (product_id, product_name, product_description, product_image_filepath, product_in_shop_id, shop_in_city_id)
The second approach will use Django queryset caching, which I have no experience at all, and maybe you can give me advice on how to make it efficient.
The second way would be to get Product.objects.filter(by input keywords).select_related().all(), cache this list of id's of products, and return this queryset of the viewset.
Then, from the client's side, I make another GET request, just for the images, and I don't know how to reuse the list of id's of products, that I queried earlier, in the previous viewset / queryset.
How do I fetch only the images that I need, for products that matched the user input keywords, such that I don't need to query the id's of products again?
How will the code look like exactly, for caching this in one viewset, and reusing it again in another viewset?
Then I also need one more GET request to get the list of shops and cities, where the product is available, so it will be fantastic if I can reuse the list of id's I got from the first queryset to fetch the products.
Is the second approach a good idea? If yes, how to implement it exactly?
Or should I stick with the first approach, which I am sceptical is the right way to do this.
I am using PostgreSQL database, and I will probably end up using ElasticSearch for my search engine, to quickly find the matching keywords.
I need to allow users to create and store filters for one of my models. The only decent idea I came up with is something like this:
class MyModel(models.Model):
field1 = models.CharField()
field2 = models.CharField()
class MyModelFilter(models.Model):
owner = models.ForeignKey('User', on_delete=models.CASCADE, verbose_name=_('Filter owner'))
filter = models.TextField(_('JSON-defined filter'), blank=False)
So the filter field store a string like:
{"field1": "value1", "field2": "value2"}.
Then, somewhere in code:
filters = MyModelFilter.objects.filter(owner_id=owner_id)
querysets = [MyModel.objects.filter(**json.loads(filter)) for filter in filters]
result_queryset = reduce(lambda x, y: x|y, querysets)
This is not safe and I need to control available filter keys somehow. On the other hand, it presents full power of django queryset filters. For example, with this code I can filter related models.
So I wonder, is there any better approach to this problem, or maybe a 3rd-party library, that implements same functionality?
UPD:
reduce in code is for filtering with OR condition.
UPD2:
User-defined filters will be used by another part of system to filter newly added model instances, so I really need to store them on server-side somehow (not in cookies or something like that).
SOLUTION:
In the end, I used django-filter to generate filter form, then grabbing it's query data, converting in to json and saving it to the database.
After that, I could deserialize that field and use it in my FilterSet again. One problem that I couldn't solve in a normal way is testing single model in my FilterSet (when model in already fetched and I need to test, it it matches filter) so I ended up doing it manually (by checking each filter condition on model).
Are you sure this is actually what you want to do? Are your end users going to know what a filter is, or how to format the filter?
I suggest that you look into the Django-filter library (https://django-filter.readthedocs.io/).
It will enable you to create filters for your Django models, and then assist you with rendering the filters as forms in the UI.
I have a scientific research publications data of 2 Million records. I used django restframework to write apis for searching the data in title and abstract. This is taking me 12 seconds while using postgres as db, but if I used MongoDB as db, it goes down to 6seconds.
But even 6 seconds sounds a lot of waiting for user to me. I indexed the title and abstract, but abstract indexing failed because some of the abstract texts are too lengthy.
Here is the django Model using MongoDB(MongoEngine as ODM):
class Journal(Document):
title = StringField()
journal_title = StringField()
abstract = StringField()
full_text = StringField()
pub_year = IntField()
pub_date = DateTimeField()
pmid = IntField()
link = StringField()
How do I improve the query performance, what stack makes the search and retrieval more faster?.
Some pointers about optimisation for the Django ORM with Postgres:
Use db_index=True on fields that will be search upon often and have some degree of repetition between entries, like "title".
Use values() and values_list() to select only the columns you want from a QuerySet.
If you're doing full text search in any of those columns (like a contains query), bear in mind that Django has support for full text search directly on a Postgres database.
Use print queryset.query to check what kind of SQL query is going into your database and if it can be improved upon.
Many Postgres optimisation techniques rely in custom SQL queries that can be made in Django by using RawSQL expressions.
Remember that there are many, many ways to search for data in a database, be it relational or not-relational in nature. In your case, MongoDB is not "faster" than Postgres, it's just doing a better job at querying what you really want.
I am creating a site where the database model looks similar to this.
class Category(models.Model):
name = modles.CharField(max_length=30)
class Photos(models.Model):
name = models.CharField(max_length=30)
category = models.ForeignKey(Category)
Now I am selecting an element from photos and storing it into cache with cache.set('object',object,timeout). Now I try to access photos.name from this cache, no queries are performed. The moment I query for photos.category it performs query. Is there any way to prevent this. I only want the id of the category, after getting the id, I can query the category cache to get the element. What is the solution to implement this. Caching this was has improved my benchmarks significantly, I am trying to get more performance out of it.
If you just want the ID, you can do photos.category_id.
You might also want to explore using select_related() to get the related category at the time when you query the original photo.
I figured out the answer myself. My problem was that when I retrieve the object from cache and I query the id it makes another query to the actual database. The solution is simple. Before you save the queryset to cache just query the foreign key.
like
get photo object from database
q=photos.category
cache.set('object',object,timeout)
Remember the queryset is lazy.
If you save this to database, when you access it from cache next time, it will contain the foreign key data also. Hope this helps.
I'm trying to optimise my app by keeping the number of queries to a minimum... I've noticed I'm getting a lot of extra queries when doing something like this:
class Category(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=127, blank=False)
class Project(models.Model):
categories = models.ManyToMany(Category)
Then later, if I want to retrieve a project and all related categories, I have to do something like this :
{% for category in project.categories.all() %}
Whilst this does what I want it does so in two queries. I was wondering if there was a way of joining the M2M field so I could get the results I need with just one query? I tried this:
def category_list(self):
return self.join(list(self.category))
But it's not working.
Thanks!
Which, whilst does what I want, adds an extra query.
What do you mean by this? Do you want to pick up a Project and its categories using one query?
If you did mean this, then unfortunately there is no mechanism at present to do this without resorting to a custom SQL query. The select_related() mechanism used for foreign keys won't work here either. There is (was?) a Django ticket open for this but it has been closed as "wontfix" by the Django developers.
What you want is not seem to possible because,
In DBMS level, ManyToMany relatin is not possible, so an intermediate table is needed to join tables with ManyToMany relation.
On Django level, for your model definition, django creates an ectra table to create a ManyToMany connection, table is named using your two tables, in this example it will be something like *[app_name]_product_category*, and contains foreignkeys for your two database table.
So, you can not even acces to a field on the table with a manytomany connection via django with a such categories__name relation in your Model filter or get functions.