I am creating a site where the database model looks similar to this.
class Category(models.Model):
name = modles.CharField(max_length=30)
class Photos(models.Model):
name = models.CharField(max_length=30)
category = models.ForeignKey(Category)
Now I am selecting an element from photos and storing it into cache with cache.set('object',object,timeout). Now I try to access photos.name from this cache, no queries are performed. The moment I query for photos.category it performs query. Is there any way to prevent this. I only want the id of the category, after getting the id, I can query the category cache to get the element. What is the solution to implement this. Caching this was has improved my benchmarks significantly, I am trying to get more performance out of it.
If you just want the ID, you can do photos.category_id.
You might also want to explore using select_related() to get the related category at the time when you query the original photo.
I figured out the answer myself. My problem was that when I retrieve the object from cache and I query the id it makes another query to the actual database. The solution is simple. Before you save the queryset to cache just query the foreign key.
like
get photo object from database
q=photos.category
cache.set('object',object,timeout)
Remember the queryset is lazy.
If you save this to database, when you access it from cache next time, it will contain the foreign key data also. Hope this helps.
Related
I know I can fetch all authors of a paper like:
paper.authors.all()
This works fine, but just returns me a QuerySet of Authors.
But I want the ManyToMany Object like (because I want to sort after the ID's)
(id (BigAutoField), paper, author)
Is there a faster way to do it then:
Paper.authors.through.objects.all().filter(paper=paper)
Because my Database is really Large ~200 million entries, the command above is not feasible
My Model looks like:
class Paper(models.Model, ILiterature):
authors = models.ManyToManyField(Author, blank=True)
(...)
You can try to select in bulk,
papers = Paper.authors.through.in_bulk(ids)
Django bulk commands are faster and designed for massive DB's like yours. You can check https://levelup.gitconnected.com/optimizing-django-queries-28e96ad204de here for details.
Let's use these 4 simple models for example. A city can have multiple shops, and a shop can have multiple products, and a product can have multiple images.
models.py
class City(models.Model):
name=models.CharField(max_length=300)
class Shop(models.Model):
name = models.CharField(max_length=300)
city = models.ForeignKey(City, related_name='related_city', on_delete=models.CASCADE)
class Product(models.Model):
name=models.CharField(max_length=300)
description=models.CharField(max_length=5000)
shop=models.ForeignKey(Shop, related_name='related_shop', on_delete=models.CASCADE)
class Image(models.Model):
image=models.ImageField(null=True)
product=models.ForeignKey(Product, related_name='related_product', on_delete=models.CASCADE)
For an eCommerce website, users will be writing keywords and I filter on the products names, to get the matching results. I also want to fetch together the related data of shops, cities and images, relevant to the products I will be showing.
To achieve that I am using .select_related() to retrieve the other objects from the foreignKeys as well.
Now, my question is, what is the best way to send that to the client?
One way is to make a single serializer, that groups all data from all 4 tables in a single JSON. That JSON will look like 1NF, since it will have many repetitions, for example, there will be new row for every image, and every shop that the product can be found, and the 10.000 character long description will be repeated for each row, so this is not such a good idea. More specifically the fields are: (product_id, product_name, product_description, product_image_filepath, product_in_shop_id, shop_in_city_id)
The second approach will use Django queryset caching, which I have no experience at all, and maybe you can give me advice on how to make it efficient.
The second way would be to get Product.objects.filter(by input keywords).select_related().all(), cache this list of id's of products, and return this queryset of the viewset.
Then, from the client's side, I make another GET request, just for the images, and I don't know how to reuse the list of id's of products, that I queried earlier, in the previous viewset / queryset.
How do I fetch only the images that I need, for products that matched the user input keywords, such that I don't need to query the id's of products again?
How will the code look like exactly, for caching this in one viewset, and reusing it again in another viewset?
Then I also need one more GET request to get the list of shops and cities, where the product is available, so it will be fantastic if I can reuse the list of id's I got from the first queryset to fetch the products.
Is the second approach a good idea? If yes, how to implement it exactly?
Or should I stick with the first approach, which I am sceptical is the right way to do this.
I am using PostgreSQL database, and I will probably end up using ElasticSearch for my search engine, to quickly find the matching keywords.
I have two models. One is Task model and other is reward model.
class Task(models.Model):
assigned_by = models.CharField(max_length=100)
class Reward(models.Model):
task = model.ForeignKey(Task)
Now I want to return a queryset of Task along with the reward field in it. I tried this query.
search_res = Task.objects.annotate(reward='reward').
I got this error: The annotation 'reward' conflicts with a field on the model.
Please tell how to solve this. I want an field reward in each task object.
To reach your goal with the actual models I would simply use the relations along with the task.
Let's say you have a task (or a queryset of tasks):
t = Task.objects.get(pk=1)
or
for t in Task.objects.all():
you can get the reward like this:
t.reward_set.first()
Take care of exception in case there's no reward actually linked to the task.
That incurs in quite an amount of queries for large datasets, so you could optimize the requests toward the DB with select_related or prefetch_related depending on your needs. Look at the Django docs for that.
I have two simple Django models:
class PhotoStream(models.Model):
cover = models.ForeignKey('links.Photo')
creation_time = models.DateTimeField(auto_now_add=True)
class Photo(models.Model):
owner = models.ForeignKey(User)
which_stream = models.ManyToManyField(PhotoStream)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
Currently the only data I have is 6 photos, that all belong to 1 photostream. I'm trying the following to prefetch all related photos when forming a photostream queryset:
queryset = PhotoStream.objects.order_by('-creation_time').prefetch_related('photo_set')
for obj in queryset:
print obj.photo_set.all()
#print connection.queries
Checking via the debug toolbar, I've found that the above does exactly the same number of queries it would have done if I remove the prefetch_related part of the statement. It's clearly not working. I've tried prefetch_related('cover') as well - that doesn't work either.
Can anyone point out what I'm doing wrong, and how to fix it? My goal is to get all related photos for every photostream in the queryset. How can I possibly do this?
Printing connection.queries after running the for loop includes, among other things:
SELECT ("links_photo_which_stream"."photostream_id") AS "_prefetch_related_val", "links_photo"."id", "links_photo"."owner_id", "links_photo"."image_file" FROM "links_photo" INNER JOIN "links_photo_which_stream" ON ("links_photo"."id" = "links_photo_which_stream"."photo_id") WHERE "links_photo_which_stream"."photostream_id" IN (1)
Note: I've simplified my models posted in the question, hence the query above doesn't include some fields that actually appear in the output, but are unrelated to this question.
Here are some of the extracts from prefetch_related:
**prefetch_related**, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python.
And, some more:
>>> Pizza.objects.all().prefetch_related('toppings')
This implies a self.toppings.all() for each Pizza; now each time self.toppings.all() is called, instead of having to go to the database for the items, it will find them in a prefetched QuerySet cache that was populated in a single query.
So the number of queries you see will always be the same but if you use prefetch_related then instead of hitting the database on for each photostream it will hit the prefetched QuerySet cache that it already built and get the photo_set from there.
I'm trying to optimise my app by keeping the number of queries to a minimum... I've noticed I'm getting a lot of extra queries when doing something like this:
class Category(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=127, blank=False)
class Project(models.Model):
categories = models.ManyToMany(Category)
Then later, if I want to retrieve a project and all related categories, I have to do something like this :
{% for category in project.categories.all() %}
Whilst this does what I want it does so in two queries. I was wondering if there was a way of joining the M2M field so I could get the results I need with just one query? I tried this:
def category_list(self):
return self.join(list(self.category))
But it's not working.
Thanks!
Which, whilst does what I want, adds an extra query.
What do you mean by this? Do you want to pick up a Project and its categories using one query?
If you did mean this, then unfortunately there is no mechanism at present to do this without resorting to a custom SQL query. The select_related() mechanism used for foreign keys won't work here either. There is (was?) a Django ticket open for this but it has been closed as "wontfix" by the Django developers.
What you want is not seem to possible because,
In DBMS level, ManyToMany relatin is not possible, so an intermediate table is needed to join tables with ManyToMany relation.
On Django level, for your model definition, django creates an ectra table to create a ManyToMany connection, table is named using your two tables, in this example it will be something like *[app_name]_product_category*, and contains foreignkeys for your two database table.
So, you can not even acces to a field on the table with a manytomany connection via django with a such categories__name relation in your Model filter or get functions.