First of all, sorry if this isn't an appropriate question for StackOverflow. I've tried to make it as generalisable as possible.
I want to create a database (MySQL, site running Django) that has users, who can be allocated a certain number of points for various types of action - it's a collaborative game. My requirements are to obtain:
the number of points a user has
the user's ranking compared to all other users
and the overall leaderboard (i.e. all users ranked in order of points)
This is what I have so far, in my Django models.py file:
class SiteUser(models.Model):
name = models.CharField(max_length=250 )
email = models.EmailField(max_length=250 )
date_added = models.DateTimeField(auto_now_add=True)
def points_total(self):
points_added = PointsAdded.objects.filter(user=self)
points_total = 0
for point in points_added:
points_total += point.points
return points_total
class PointsAdded(models.Model):
user = models.ForeignKey('SiteUser')
action = models.ForeignKey('Action')
date_added = models.DateTimeField(auto_now_add=True)
def points(self):
points = Action.objects.filter(action=self.action)
return points
class Action(models.Model):
points = models.IntegerField()
action = models.CharField(max_length=36)
However it's rapidly becoming clear to me that it's actually quite complex (in Django query terms at least) to figure out the user's ranking and return the leaderboard of users. At least, I'm finding it tough. Is there a more elegant way to do something like this?
This question seems to suggest that I shouldn't even have a separate points table - what do people think? It feels more robust to have separate tables, but I don't have much experience of database design.
this is old, but I'm not sure exactly why you have 2 separate tables (Points Added & Action). It's late, so maybe my mind isn't ticking, but it seems like you just separated one table into 2 for some reason. It doesn't seem like you get any benefit out of it. It's not like there's a 1 to many relationship in it right?
So first of all, I would combine those two tables. Secondly, you are probably better off storing points_total into a value in your site_user table. This is what I think Demitry is trying to allude to, but didn't say explicitly. This way instead of doing this whole additional query (pulling everything a user has done in his history of the site is expensive) + looping action (going through it is even more expensive), you can just pull it as one field. It's denormalizing the data for a greater good.
Just be sure to update the value everytime you add in something that has points. You can use django's post_save signal to do that
It's a bit more difficult to have points saved in the same table, but it's totally worth it. You can do very simple ordering/filtering operations if you have computed points total on user model. And you can count totals only when something changes (not every time you want to show them). Just put some validation logic into post_save signals and make sure to cover this logic with tests and you're good.
p.s. denormalization on wiki.
Related
I am working on a project where I need to recalculate values based on if fields changed or not. Here is an example:
Model1:
field_a = DatetimeField()
calculated_field_1 = ForeignKey(Model2)
Model2:
field_j = DatetimeField()
If field_a changes on model1 I have to recalculate the value for field calculated_field_1 to see if it needs to change as well. The calculations that are done require me querying the database to check values of other models and then determining if the value of the calculated field needs to change.
Example) field_a changes then I would have to do this calculation
result = Model2.objects.filter(field_j__gte=Model1.field_a)
If result.exists():
Model1.field_a = result.first()
Model1.save(update_fields=(‘field_a’,))
This is the most basic example I could think of and the queries can be much more complicated than this.
The project started out with one calculation when a field changed so I decided the best approach was to use django signals. Months later the requirements have changed for the project and now there are several other calculations that I had to implement that are very similar to the example above. I have noticed that my post_save function is getting out of hand and I am just wondering what alternatives there are to using signals. Although the post_save calculations I do now take far less than half a second, for the sake of my question pretend they took a second or more.
A valid answer cannot include doing these calculations on the fly when I pull them from the db. We use a validation framework that requires me to set these values on the model and querying on the fly has been an approach we attempted but for performance reasons it was not viable. Also, on field change one of the requirements is that the user needs to see the results of the calculated field so this has to happen synchronously.
What are some alternative approaches to using this pattern?
I'm making a web site for a friend for a small business, and for each user, I want them to be able to access their orders by number which starts from 1 for each user, but in the backend this should be a global numbering. So for each user, their first order will be at /orders/1/ and so on. Is there a consensus on how this should be achieved in general? Way I see it, I can do this 2 ways:
Store the number in another column in the orders table. I'd prefer not to do this because I'm not entirely sure how to handle deletions without going through and updating all the records of the user. If someone knows the edge cases I need to handle, I might go with this.
OR
For every queryset I make when getting the orders page for each user I handle the numbering, benefit of this is that it will always give the correct numbering, especially if I just do it in the template. Right now this seems easier, but I have a feeling this would give rise to problems in the future. Main problem I see is I'm not sure how to make it link to the correct url without the primary key being in that url.
I recommend you to store MyUser in a separate app, say accounts
class MyUser(BaseUser):
# extra fields
And store Order in a separate app, say order
from accounts.models import MyUser
class Order(models.Model):
user = models.ForeignKey(MyUser)
order_num = models.IntegerField()
# other fields
keep update this order_num by the count of orders the user has made.
to get the count,
count = Order.objects.filter(user==request.user).count()
Some of my models are only unique in a combination of keys. I don't want to use an auto-numbering id as the identifier as subsets of the data will be exported to other systems (such as spreadsheets), modified and then used to update the master database.
Here's an example:
class Statement(models.Model):
supplier = models.ForeignKey(Supplier)
total = models.DecimalField("statement total", max_digits=10, decimal_places=2)
statement_date = models.DateField("statement date")
....
class Invoice(models.Model):
supplier = models.ForeignKey(Supplier)
amount = models.DecimalField("invoice total", max_digits=10, decimal_places=2)
invoice_date = models.DateField("date of invoice")
statement = models.ForeignKey(Statement, blank=True, null=True)
....
Invoice records are only unique for a combination of supplier, amount and invoice_date
I'm wondering if I should create a slug for Invoice based on supplier, amount and invoice_date so that it is easy to identify the correct record.
An example of the problem of having multiple related fields to identify the right record is django-csvimport which assumes there is only one related field and will not discriminate on two when building the foreign key links.
Yet the slug seems a clumsy option and needs some kind of management to rebuild the slugs after adding records in bulk.
I'm thinking this must be a common problem and maybe there's a best practice design pattern out there somewhere.
I am using PostgreSQL in case anyone has a database solution. Although I'd prefer to avoid that if possible, I can see that it might be the way to build my slug if that's the way to go, perhaps with trigger functions. That just feels a bit like hidden functionality though, and may cause a headache for setting up on a different server.
UPDATE - after reading initial replies
My application requires that data may be exported, modified remotely, and merged back into the master database after review and approval. Hidden autonumber keys don't easily survive that consistently. The relation invoices[2417] is part of statements[265] is not persistent if the statement table was emptied and reloaded from a CSV.
If I use the numeric autonumber pk then any process that is updating the database would need to refresh the related key numbers or by using the multiple WITH clause.
If I create a slug that is based on my 3 keys but easy to reproduce then I can use it as the key - albeit clumsily. I'm thinking of a slug along the lines:
u'%s %s %s' % (self.supplier,
self.statement_date.strftime("%Y-%m-%d"),
self.total)
This seems quite clumsy and not very DRY as I expect I may have to recreate the slug elsewhere duplicating the algorithm (maybe in an Excel formula, or an Access query)
I thought there must be a better way I'm missing but it looks like yuvi's reply means there should be, and there will be, but not yet :-(
What you're talking about it a multi-column primary key, otherwise known as "composite" or "compound" keys. Support in django for composite keys today is still in the works, you can read about it here:
Currently Django models only support a single column in this set,
denying many designs where the natural primary key of a table is
multiple columns [...] Current state is that the issue is
accepted/assigned and being worked on [...]
The link also mentions a partial implementation which is django-compositekeys. It's only partial and will cause you trouble with navigating between relationships:
support for composite keys is missing in ForeignKey and
RelatedManager. As a consequence, it isn't possible to navigate
relationships from models that have a composite primary key.
So currently it isn't entirely supported, but will be in the future. Regarding your own project, you can make of that what you will, though my own suggestion is to stick with the fully supported default of a hidden auto-incremented field that you don't even need to think about (and use unique_together to enforce the uniqness of the described fields instead of making them your primary keys).
I hope this helps!
No.
Model needs to have one field that is primary_key = True. By default this is the (hidden) autofield which stores object Id. But you can set primary_key to True at any other field.
I've done this in cases, Where i'm creating django project upon tables which were previously created manually or through some other frameworks/systems.
In reality - you can use whatever means you can think of, for joining objects together in queries. As long as query returns bunch of data that can be associated with models you have - it does not really matter which field you are using for joins. Just keep in mind, that the solution you use should be as effective as possible.
Alan
I want to have facebook kind of news feed, in which i need to fetch data from 2 different models ordered by time.
Models are something like :
class User_image(models.Model):
user = models.ForeignKey(User_info)
profile_pic = models.ImageField(upload_to='user_images')
created = models.DateTimeField(auto_now_add=True)
class User_status(models.Model):
user = models.ForeignKey(User_info)
status = models.CharField(max_length=1)
created = models.DateTimeField(auto_now_add=True)
As per my requirement, i can not make a single model out of these two models.
Now i need to know the simple code in views and template so as to display profile pic and status in the news feed according to time.
Thanks.
The most simple way of archiving this is to have a base model, call it Base_event,
class Base_event(models.Model):
user = models.ForeignKey(User_info)
created = models.DateTimeField(auto_now_add=True)
and derive both your models from this Base. This way you write less code, and you archive your objective. Notice that you have to make an implementation choice: how will they will inherit from base. I advice to read Django documentation to help you choose wisely according to what you want to do.
EDIT:
I would notice that the accepted answer has a caveat. It sorts the data on the python and not on the mysql, which means it will have an impact on the performance: the whole idea of mysql having SORT is to avoid having to hit the database and them perform the sorting. For instance, if you want to retrieve just the first 10 elements sorted, with the accepted solution you have to extract all the entries, and only then decide which ones are the first 10.
Something like Base_event.objects.filter().sort_by(...)[10] would only extract 10 elements of the database, instead of the whole filtered table.
The easy solution now becomes the problem later.
Try something like creating list chain.
feed = list(chain(User_image,User_status))
feed = sorted(feed, key=operator.attrgetter('date_added'))
for those who refer it as not correct.
https://stackoverflow.com/a/434755/2301434
In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.
Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.
The reason I was thinking about splitting the model in two is because I have some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).
Are there performance gains from putting typically empty fields, such as actual_completion_date and actual_project_costs, into a separate model/table in Django?
Split into Two Models
class Project(models.Model):
project_number = models.SlugField(max_length=5, blank=False,
primary_key=True)
budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
submitted_on = models.DateField(auto_now_add=True)
class ProjectExtendedInformation(models.Model):
project = models.OneToOneField(CapExProject, primary_key=True)
actual_completion_date = models.DateField(blank=True, null=True)
actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
blank=True, null=True)
Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.
Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the only method:
Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')
Which produces something akin to:
SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;
Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass Project to create a specific kind of project that requires additional fields but still needs all the fields of a generic Project.
For your case, if there's some info that's available only when it's closed, I'd indeed advise making a separate model.
Joins aren't bad. Especially in your case the join will be faster if you have all rows in one table and much fewer rows in the other one. I've worked with databases a lot, and in most cases it's a pure guess to tell if a join will be better or worse. Even a full table scan is better than using an index in many cases. You need to look at the EXPLAINs, if performance is a concern, and profile the Db work if possible (I know Oracle supports this.) But before performance becomes an issue, I prefer quicker development.
We have a table in Django with 5M rows. And we needed a column that would have been not null only for 1K rows. Just altering the table would have taken half a day. Rebuilding from scratch also takes a few hours. We've chosen to make a separate model.
I've been to a lecture on Domain Driven Design in which the author explained that it is important, especially in development of a new app, to separate models, to not stuff everything in one class.
Let's say you have a CargoAircraft class and PassengerAircraft. It's so tempting to put them in one class and work "seamlessly", isn't it? But interactions with them (scheduling, booking, weight or capacity calculations) are completely different.
So, by putting everything in one class you force yourself to bunch of IF clauses in every method, to extra methods in Manager, to harder debugging, to bigger tables in the DB. Basically you make yourself spend more time developing for the sake of what? For only two things: 1) fewer joins 2) fewer class names.
If you separate the classes, things go much easier:
clean code, no ugly ifs, no .getattr and defaults
easy debugging
more mainainable database
hence, faster development.