I use django 1.10.1, postgres 9.5 and redis.
I have a table that store users votes and looks like:
==========================
object | user | created_on
==========================
where object and user are foreign keys to the id column of their own tables respectively.
The problem is that in many situations, I have to list many objects in one page. If the user is logged in or authenticated, I have to check for every object whether it was voted or not (and act depending on the result, something like show vote or unvote button). So in my template I have to call such function for every object in the page.
def is_obj_voted(obj_id, usr_id):
return ObjVotes.objects.filter(object_id=obj_id, user_id=usr_id).exists()
Since I may have tens of objects in one page, I found, using django-debug-toolbar, that the database access alone could take more than one second because I access just one row for each query and that happens in a serial way for all objects in the page. To make it worse, I use similar queries from that tables in other pages (i.e. filter using user only or object only).
What I try to achieve and what I think it is the right thing to do is to find a way to access the database just once to fetch all objects voted filtered by some user (maybe when the user logs in in or the at the first page hit requiring such database access), and then filter it further to whatever I want depending on the page needs. Since I use redis and django-cacheops app, can it help me to do that job?
In your case I'd better go with getting an array of object IDs and querying all votes by user's ID and this array, something like:
object_ids = [o.id for o in Object.objects.filter(YOUR CONDITIONS)]
votes = set([v.object_id for v in ObjVotes.objects.filter(object_id__in=object_ids, user_id=usr_id)]
def is_obj_voted(obj_id, votes):
return obj_id in votes
This will make only one additional database query for getting votes by user per page.
Related
I'm making a web site for a friend for a small business, and for each user, I want them to be able to access their orders by number which starts from 1 for each user, but in the backend this should be a global numbering. So for each user, their first order will be at /orders/1/ and so on. Is there a consensus on how this should be achieved in general? Way I see it, I can do this 2 ways:
Store the number in another column in the orders table. I'd prefer not to do this because I'm not entirely sure how to handle deletions without going through and updating all the records of the user. If someone knows the edge cases I need to handle, I might go with this.
OR
For every queryset I make when getting the orders page for each user I handle the numbering, benefit of this is that it will always give the correct numbering, especially if I just do it in the template. Right now this seems easier, but I have a feeling this would give rise to problems in the future. Main problem I see is I'm not sure how to make it link to the correct url without the primary key being in that url.
I recommend you to store MyUser in a separate app, say accounts
class MyUser(BaseUser):
# extra fields
And store Order in a separate app, say order
from accounts.models import MyUser
class Order(models.Model):
user = models.ForeignKey(MyUser)
order_num = models.IntegerField()
# other fields
keep update this order_num by the count of orders the user has made.
to get the count,
count = Order.objects.filter(user==request.user).count()
Say I have a general website that allows someone to download their feed in a small amount of time. A user can be subscribed to many different pages, and the user's feed must be returned to the user from the server with only N of the most recent posts between all of the pages subscribed to. Originally when a user queried the server for a feed, the algorithm was as follows:
look at all of the pages a user subscribed to
getting the N most recent posts from each page
sorting all of the posts
return the N most recent posts to the user as their feed
As it turns out, doing this EVERY TIME a user tried to refresh a feed was really slow. Thus, I changed the database to have a table of feedposts, which simply has a foreign key to a user and a foreign key to the post. Every time a page makes a new post, it creates a feed post for each of its subscribing followers. That way, when a user wants their feed, it is already created and does not have to be created upon retrieval.
The way I am doing this is creating far too many rows and simply does not seem scalable. For instance, if a single page makes 1 post & has 1,000,000 followers, we just created 1,000,000 new rows in our feedpost table.
Please help!
How do companies such as facebook handle this problem? Do they generate the feed upon request? Are my database relationships terrible?
It's not that the original schema itself would be inherently wrong, at least not based on the high-level description you have provided. The slowness stems from the fact that you're not accessing the database in a way relational databases should be accessed.
In general, when querying a relational database, you should use JOINs and in-database ordering where possible, instead of fetching a bunch of data, and then trying to connect related objects and sort them in your code. If you let the database do all this for you, it will be much faster, because it can take advantage of indices, and only access those objects that are actually needed.
As a rule of thumb, if you need to sort the results of a QuerySet in your Python code, or loop through multiple querysets and combine them somehow, you're most likely doing something wrong and you should figure out how to let the database do it for you. Of course, it's not true every single time, but certainly often enough.
Let me try to illustrate with a simple piece of code. Assume you have the following models:
class Page(models.Model):
name = models.CharField(max_length=47)
followers = models.ManyToManyField('auth.User', related_name='followed_pages')
class Post(models.Model):
title = models.CharField(max_length=147)
page = models.ForeignKey(Page, related_name='posts')
content = models.TextField()
time_published = models.DateTimeField(auto_now_add=True)
You could, for example, get the list of the last 20 posts posted to pages followed by the currently logged in user with the following single line of code:
latest_posts = Post.objects.filter(page__followers=request.user).order_by('-time_published')[:20]
This runs a single SQL query against your database, which only returns the (up to) 20 results that match, and nothing else. And since you're joining on primary keys of all tables involved, it will conveniently use indices for all joins, making it really fast. In fact, this is exactly the kind of operation relational databases were designed to perform efficiently.
Caching will be the solution here.
You will have to reduce the database reads, which are much slower as compared to cache reads.
You can use something like Redis to cache the post.
Here is an amazing answer for better understanding
Is Redis just a cache
Each page can be assigned a key, and you can pull all of the posts for that page under that key.
you need not to cache everything , just cache resent M posts, where M>>N and safe enough to reduce the database calls.Now if in case user requests for posts beyond the latesd M ones, then they can be directly fetched from the DB.
Now when you have to generate the feed you can make a DB call to get all of the subscribed pages(or you can put in the cache as well) and then just get the required number of post's from the cache.
The problem here would be keeping the cache up-to date.
For that you can use something like django-signals. Whenever a new post is added, add it to the cache as well using the signal.
So for each DB write you will have to write to cache as well.
But then you will not have to read from DB and as Redis is a in memory datastore it is pretty fast as compared to standard relational databases.
Edit:
These are a few more articles which can help for better understanding
Does Stack Exchange use caching and if so, how
How Twitter Uses Redis to Scale - 105TB RAM, 39MM QPS, 10,000+ Instances
In Opencart we have the following type of products.
Printed Books (hard copies) which will be shipped to customer.
For the same we have Digital Downloads.
We want an option set at the store level whether user wants to see Downloads or Printed Copies.
When the user chooses downloads option, it should display only downloadable products in all the categories.
When user chooses Printed Copies, it should display only printed products in each of the categories.
Any suggestions to achieve this functionality are welcome.
Thanks
"Any suggestions to achieve this functionality"
From my point of view (which may not be optimal) we need:
Permanent storage for the user preference [5 % done]
add a column to the table <DB_PREFIX>customer with a type of INT and a value of 0 if the user is interested in all products, 1 for digital downloads and so on..., if there is a possibility that you will add new preferences later, then it's better to store a serialized version of all the user preferences in a column of type TEXT
A way to retrieve user preference [25 % done]
you can just retrieve it from the database every time you need it, a better way is to keep it in the session at the same way the user data (like address, telephone)in the class User is kept
A way to change the user preference [40 % done]
some check box in the user settings page, it's also preferable (UX wise) that the user preference is shown in the header next to his name and can be edited directly from there
And finally, displaying products based on that preference [100 % done :D]
you will need to change some code in the controller of the category page, best seller module, latest products .... (any module that involves displaying products)
Simple, naive and ugly solution:you will notice that there is a code segment that copies products data to the view data, it looks like
$data['products'][] = array( in OC 2.X and $this->data['products'][] = array( for versions prior to OC 2, a simple if condition here will be enough, just check for the user preference and decide accordingly whether to copy the product to the view data or not
Better solution: filter products based on the user preference from the very beginning in the model functions, add an extra optional parameter to all model functions that retrieves products (don't forget those functions that retrieves products count) that indicates the user preference, check inside model functions if the parameter is set then do you work in the query
I have a model (Realtor) with a ForeignKey field (BillingTier), which has a ManyToManyField (BillingPlan). For each logged in realtor, I want to check if they have a billing plan that offers automatic feedback on their listings. Here's what the models look like, briefly:
class Realtor(models.Model):
user = models.OneToOneField(User)
billing_tier = models.ForeignKey(BillingTier, blank=True, null=True, default=None)
class BillingTier(models.Model):
plans = models.ManyToManyField(BillingPlan)
class BillingPlan(models.Model):
automatic_feedback = models.BooleanField(default=False)
I have a permissions helper that checks the user permissions on each page load, and denies access to certain pages. I want to deny the feedback page if they don't have the automatic feedback feature in their billing plan. However, I'm not really sure the best way to get this information. Here's what I've researched and found so far, but it seems inefficient to be querying on each page load:
def isPermitted(user, url):
premium = [t[0] for t in user.realtor.billing_tier.plans.values_list('automatic_feedback') if t[0]]
I saw some solutions which involved using filter (ManyToMany field values from queryset), but I'm equally unsure of using the query for each page load. I would have to get the billing tier id from the realtor: bt_id = user.realtor.billing_tier.id and then query the model like so:
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct()
I think the second option reads nicer, but I think the first would perform better because I wouldn't have to import and query the BillingTier model.
Is there a better option, or are these two the best I can hope for? Also, which would be more efficient for every page load?
As per the OP's invitation, here's an answer.
The core question is how to define an efficient permission check based on a highly relational data model.
The first variant involves building a Python list from evaluating a Django query set. The suspicion must certainly be that it imposes unnecessary computations on the Python interpreter. Although it's not clear whether that's tolerable if at the same time it allows for a less complex database query (a tradeoff which is hard to assess), the underlying DB query is not exactly simple.
The second approach involves fetching additional 1:1 data through relational lookups and then checking if there is any record fulfilling access criteria in a different, 1:n relation.
Let's have a look at them.
bt_id = user.realtor.billing_tier.id: This is required to get the hook for the following 1:n query. It is indeed highly inefficient in itself. It can be optimized in two ways.
As per Django: Access Foreign Keys Directly, it can be written as bt_id = user.realtor.billing_tier_id because the id is of course present in billing_tier and needs not be found via a relational operation.
Assuming that the page in itself would only load a user object, Django can be told to fetch and cache relational data along with that via select_related. So if the page does not only fetch the user object but the required billing_tier_id as well, we have saved one additional DB hit.
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct() can be optimized using Django's exists because that will redurce efforts both in the database and regarding data traffic between the database and Python.
Maybe even Django's prefetch_related can be used to combine the 1:1 and 1:n queries into a single query, but it's much more difficult to judge whether that pays. Could be worth a try.
In any case, it's worth installing a gem called Django Debug Toolbar which will allow you to analyze how much time your implementation spends on database queries.
I am writing a trip planner, and I have users. For the purposes of this question, lets assume my models are as simple as having a "Trip" model and having a "UserProfile" model.
There is a functionality of the site that allows to search for routes (via external APIs), and then dynamically assembles those into "trips", which we then display. A new search deletes all the old "trips" and figures out new ones.
My problem is this: I want to save some of these trips to the user profile. If the user selects a trip, I want it to be permanently associated with that profile. Currently I have a ManyToMany field for Trips in my UserProfile, but when the trips are "cleaned/flushed", all trips are deleted, and that association is useless. I need a user to be able to go back a month later and see that trip.
I'm looking for an easy way to duplicate that trip data, or make it static once I add it to a profile . .. I don't quite know where to start. Currently, the way it is configured is there is a trips_profile datatable that has a foreign key to the "trips" table . . . which would be fine if we weren't deleting/flushing the trips table all the time.
Help appreciated.
It's hard to say exactly without your models, but given the following layout:
class UserProfile(models.Model):
trips = models.ManyToManyField(Trip)
You can clear out useless Trips by doing:
Trip.objects.filter(userprofile__isnull=True).delete()
Which will only delete Trips not assigned to a UserProfile.
However, given the following layout:
class Trip(models.Model):
users = models.ManyToManyField(User)
You could kill the useless trips with:
Trip.objects.filter(users__isnull=True).delete()
The second method has the side benefit of not requiring any changes to UserProfile or even a UserProfile at all, since you can then just get a Users trips with:
some_user.trip_set.all()