Django custom creation manager logic for temporal database - django

I am trying to develop a Django application that has built-in logic around temporal states for objects. The desire is to be able to have a singular object representing a resource, while having attributes of that resource be able to change over time. For example, a desired use case is to query the owner of a resource at any given time (last year, yesterday, tomorrow, next year, ...).
Here is what I am working with...
class Resource(models.Model):
id = models.AutoField(primary_key=True)
class ResourceState(models.Model):
id = models.AutoField(primary_key=True)
# Link the resource this state is applied to
resource = models.ForeignKey(Resource, related_name='states', on_delete=models.CASCADE)
# Track when this state is ACTIVE on a resource
start_dt = models.DateTimeField()
end_dt = models.DateTimeField()
# Temporal fields, can change between ResourceStates
owner = models.CharField(max_length=100)
description = models.TextField(max_length=500)
I feel like I am going to have to create a custom interface to interact with this state. Some example use cases (interface is completely up in the air)...
# Get all of the states that were ever active on resource 1 (this is already possible)
Resource.objects.get(id=1).states.objects.all()
# Get the owner of resource 1 from the state that was active yesterday, this is non-standard behavior
Resource.objects.get(id=1).states.at(YESTERDAY).owner
# Create a new state for resource 1, active between tomorrow and infinity (None == infinity)
# This is obviously non standard if I want to enforce one-state-per-timepoint
Resource.objects.get(id=1).states.create(
start_dt=TOMORROW,
end_dt=None,
owner="New Owner",
description="New Description"
)
I feel the largest amount of custom logic will be required to do creates. I want to enforce that only one ResourceState can be active on a Resource for any given timepoint. This means that to create some ResourceState objects, I will need to adjust/remove others.
>> resource = Resource.objects.get(id=1)
>> resource.states.objects.all()
[ResourceState(start_dt=None, end_dt=None, owner='owner1')]
>> resource.states.create(start_dt=YESTERDAY, end_dt=TOMORROW, owner='owner2')
>> resource.states.objects.all()
[
ResourceState(start_dt=None, end_dt=YESTERDAY, owner='owner1'),
ResourceState(start_dt=YESTERDAY, end_dt=TOMORROW, owner='owner2'),
ResourceState(start_dt=TOMORROW, end_dt=None, owner='owner1')
]
I know I will have to do most of the legwork around defining the logic, but is there any intuitive place where I should put it? Does Django provide an easy place for me to create these methods? If so, where is the best place to apply them? Against the Resource object? Using a custom Manager to deal with interacting with related 'ResourceState' objects?
Re-reading the above it is a bit confusing, but this isnt a simple topic either!! Please let me know if anyone has any ideas for how to do something like the above!
Thanks a ton!

too long for a comment, and purely some thoughts, not a full answer, but having dealt with many date effective records in financial systems (not in Django) some things come to mind:
My gut would be to start by putting it on the save method of the resource model. You are probably right in needing a custom manager as well.
I'd probably also flirt with the idea of a is_current boolean field in the state model but certain care would need to be considered with future date effective state records. If there is only one active state at a time, I'd also examine the need for an enddate. Having both start and end definitely makes the raw sql queries (if ever needed) easier: date() between state.start and state.end <- this would give current record, sub in any date to get that date's effective record. Also, give some consideration to the open ended end date where you don't know the end date date. Your queries will have to handle the nulls properly. YOu probably also may need to consider the open ended start date (say for a load of historical data where the original start date is unknown). I'd suggest staying away from using some super early date as a fill in (same for date far in the future for unknown end dates) - If you end up with lots of transactions, your query optimizer may thank you, however, I may be old and this doesn't matter anymore.
If you like to read about this stuff, I'd recommend a look at 1.8 in https://www.amazon.ca/Art-SQL-Stephane-Faroult/dp/0596008945/ and chapter 6:
"But before settling for one solution, we must acknowledge that
valuation tables come in all shapes and sizes. For instance, those of
telecom companies, which handle tremendous amounts of data, have a
relatively short price list that doesn't change very often. By
contrast, an investment bank stores new prices for all the securities,
derivatives, and any type of financial product it may be dealing with
almost continuously. A good solution in one case will not necessarily
be a good solution in another.
Handling data that both accumulates and changes requires very careful
design and tactics that vary according to the rate of change."

Related

I need help properly implementing a booking scheduler and availability in Django

This question has been asked many times before on StackOverflow and in the Django forums, but none of the answers I've found are appropriate or complete enough for my situation.
First, the brief:
I'm creating a web application for a car rental business. In addition to helping them organize and centralize their fleet, it will also help them collect orders directly from customers. As with most rentals, the logistics of it all can be somewhat confusing.
Someone may place an order for a car today (December 12th) but actually take the car over the Christmas to New Year's period.
A renter may borrow a car for just two days, and then extend the booking at the last minute. When this happens (often very frequently), the business usually has to scramble to find another one for a different customer who was scheduled to get that car the next day.
Adding to that, an individual car can only be rented to one client at a time, so it can't have multiple bookings for the same period.
Most answers advocate for a simple approach that looks like this:
models.py
class Booking(models.Model):
car = models.ForeignKey(Car, ...)
start_date = models.dateField(...)
end_date = models.dateField(...)
is_available = models.booleanField(default=True)
forms.py
import datetime
from django import forms
from django.core.exceptions import ValidationError
from django.utils.translation import gettext_lazy as _
from . import models
class PlaceOrderForm(forms.Form):
"""Initial order forms for customers."""
start_date = forms.DateField(help_text='When do you want the car?')
end_date = forms.DateField(help_text='When will you return the car?')
def clean_data(self, date):
data = self.cleaned_data(date)
# Check that start date is not in the past
if data < datetime.date.today():
raise ValidationError(_('Invalid date: Start in past.'))
# Ensure that start date is not today (to avoid last_minute bookings.)
if data == datetime.date.today():
raise ValidationError(_('Invalid date: Please reserve your car at least 24 hours in advance.'))
return data
cleaned_start_date = clean_data(start_date)
cleaned_end_date = clean_data(end_date)
('_' is for )
The booking has a start_date and an end_date. When a current date is within the start_date and end_date, the car is marked as unavailable. If the boolean field is_available (not represented in forms.py above) is set to "False", the car is unavailable completely.
Again, because of the unique nature of car rentals, this may be a problem. Some people book a car for six months, and others book it for two days. If someone wants a long-term rental but there's another short interlude during their expected duration, this validation would prevent them from placing the order completely!
But this is a problem: Going back to the rental model, someone may be booking a car in the future. A car that's unavailable now should still be able to be reserved for a future date.
Adding to that, an individual car can only be rented to one person at a time, so it can't have multiple bookings for the same period. Again, because of the unique nature of car rentals, this may be a problem. Some people book a car for six months, and others book it for two days. If someone wants a long-term rental but there's another short interlude during their expected duration, this validation would prevent them from placing the order completely!
So if a conflict arises, rather than blocking the booking entirely (which, again, would be a bad UX decision), it should notify the business so they can assign another car and plan ahead.
Other clients should not be able to book it for the time in which it is borrowed, but they should be able to book it for other times when it is free.
So if someone places an order now for, let's say the 24-31st of December. Those days should be blocked off. However, another person should be able to book it from today to the 23rd, and from the 31st onwards. And if the person renting it should extend, it should notify the rental business so they can assign another car to the user well in advance.
Possible idea to move forward
The core assumption in all those answers is that the booking unavailability has to be handled in Django itself, in the backend. However, I'm building this project with REST framework, and will use a Js based front-end (currently learning Javascript for this purpose).
I think that this would be better handled in a more holistic way with the in-built form validation and save functions.
The workflow would go something like this:
The User selects a car and selects the start and end dates from a drop-down calendar on the website.
The form will then check to see if the absolute basic checks (can't book a car in the past) are fine. If those work, then the order is placed and saved in the database.
If there's a scheduling conflict, the order is not blocked, but passed to the business that can assign them a different car for the period. (Generally, people don't care much for receiving particular cars--mostly the price, space and the fuel economy. Everything else is interchangeable.
Once that happens, the deposit can be collected, and the order can be set in the system.
Anyhow, that's my preliminary idea that would bring together the best of all worlds
and create a great experience for both the business and customer.
So my question is: How could this actually be set up? What would need to be on the front-end and what would go in the back-end? I'm learning programming as I go, so this may be simple, but I've been struggling with this for a week, I would appreciate any help on this!
Thanks!
Sounds like you have two processes - the customer order and the car assignment. You need to plan out your data structure and then your process flow. This will help you get things straight before you start.
Models
Using (Brackets) for foreign keys, Customer_order collects things like:
Customer(User)
desired_start_date
desired_end_date
car_type - this could be many fields
A car model
car_type
rental (many to many Rental with through table Rental)
A rental model
car (Car)
customer_order(Customer_order)
start_date
end_date
We have kept the rental model with its own start and end dates as the user may change their desired period but it shouldn't change the rental dates without checking if others exist in that time period for that car.
So the flow should go:
user passes js validation and form submitted to backend
backend checks for car availability based on type and dates
if available, creates rental
if not available, alert user and passes to staff
if a user changes a rental period (via an edit screen for existing customer_orders)
backend checks for same car availability based on existing rentals
if not available, alert user and pass to staff
You'd also create a staff only view, that lists customer_orders that can't be matched (without rental models) along with cars of the requested type that don't have rentals for that period.
Seeing as you have that view, it strikes me your backend process could use something similar to also look for and assign a different car of the same type automatically if you wanted to extend the availability check process, notifying staff it has occurred, while only referring back to staff if that type of car is unavailable.
Actually programming all this is left as an exercise for the reader.

Should I use select_for_update in my django application with postgresql database?

I am currently building a booking tool (django and postgresql) where people book spaces in a location. The location has limited capacity and therefore might run out of space while a user is trying to book a space. I imagine that this could happen when another user books a place slightly before the current user is booking, e.g. if only 1 space is left, the other user might book too and the database cannot handle it.
So my question is would you strongly advise using select_for_update or is there something else that might help me overcome this issue?
Yes, that's a correct use of select_for_update. You would be blocking a specific location row (apply a filter before calling select_for_update). That means that 2 different locations can be booked concurrently, but if there are 2 bookings for the same location happening at exactly the same second they would be called.
This creates a critical section and you can sure that it won't overlap with a critical section of another request. In within the critical section, you will have to validate that the selected time slot is free - without that validation select_for_update would have no effect.
I could imagine another approach based on unique constraints, it's not universal but might be easier to implement. Let's imagine that you are booking a resource for a specific day. You could have a "unique together" combination for the resource_id and date. A subsequent save would raise an IntegrityError and you could catch it and inform the user that the resource was just booked for the selected date.

Google App Engine (NDB): One-To-One and One-To-Many relationship

I am building a web application (Django and google's NDB) and have to structure my models now. I have read many examples about how to implement One-To-Many but - to be honest - I'm even more confused right now after having read these.
Let me describe my problem with a similar example:
I have users and each user can read multiple books. A user can stop reading at any time and the progress is saved. Even if a book has been finished, the progress will be saved and never deleted.
I need to check the progress of all books a user has started to read all the time, so this has to be efficient and should require as few db reads as possible. The amount of books is not too much (< 1000) and also the books are thin (say, only one chapter, title, author that's it). It's the mass of progress and the permanent lookup of the progress that I'm fearing since every user has his own progress to probably every book.
How can I structure my models best to these requirements? If I use the StructuredProperty in my User model will the size of the books that are refered to in Progress count towards the limit of X MB (hope not)? If not I guess something like this is the best way to go (I can read progress fast, without additional lookup, and if neccessary get the book from the db).
class Book(ndb.model):
name = ndb.StringProperty(required=True)
...
class Progress(ndb.model):
book = ndb.KeyProperty(kind="Book", required=True)
last_page_read = ndb.IntegerProperty(required=True, default=0)
...
class User(ndb.model):
name = ndb.StringProperty(required=True)
books_and_progress = ndb.StructuredProperty(Progress, repeated=True)
...
Your approach is correct. As you're using structured properties, Progress instances are not separate datastore entities, they're stored inside the User entity, so no additional lookup is necessary to get progress information for a given user. Once you have the user you also have all the information about which books he's reading and in which page he left. To put it another way, your User entities will contain the user's name and a list of "book key, last_page_read" pairs.
will the size of the
books that are refered to in Progress count towards the limit of X MB
(hope not)?
Don't know which limit are you referring to, but keep in mind that what you actually have in the User entity is the key for the Book model, not the actual data for the book. So, the size of your Book instance doesn't affect when you're retrieving User instances from the datastore.

Concurrency in django admin change view

My model:
class Order(models.Model):
property_a = models.CharField()
property_b = models.CharField()
property_c = models.CharField()
Many users will access a given record in a short time frame via admin change page, so I am having concurrency issues:
User 1 and 2 open the change page at the same time. Assume all values are blank when they load the page. User 1 sets property_a to "a", and property_b to "b", then saves. A second later if user 2 changes property b and c then saves, it will quietly overwrite all the values from user 1. in this case, property_a will go back to being blank and b and c will be whatever user 2 put in.
I need recommendations on how to handle this. If I have to have a version field in the model, how do i pass it to the admin, where do I do the check so I can elegantly notify the user their changes can't be saved because another user has modified the record? Is there a more seamless way than just returning an error to the user?
The standard solution is to prevent your users from sharing a single record. It's not at all clear why so many users are messing with the exact same Order instance.
Consider that Order is probably a composite object and you've put too much into a single model. That's the first -- and best -- solution.
If (for inexplicable reasons) you won't decompose this, then you have to create a two-part update transaction.
Requery the data. Compare with the original query as done for this user's session.
If the data doesn't match the original query, then someone else changed it. The user's changes are invalidated, rolled back, wiped out, and the user sees a new query.
If the data does match, you can try to commit the change.
The above algorithm has a race condition, which is usually resolved via low-level SQL. Note that it invalidates a user's work, making it maximally irritating.
That's why your first choice is to decompose your models to eliminate the concurrency.
my model has a miscellaneous notes field
This is a bad design. (a) Concurrency is ruined by collisions on this field. (b) There's no log or history of comments.
Item (b) means that a badly-behaved user can maliciously corrupt this data. If you keep notes and comments as a log, you can -- in principle -- limit users to changing only their own comments.
[In most databases with "miscellaneous notes" the field has become a costly, hard-to-maintain liability full of important but impossible-to-parse data. Miscellaneous notes is where users invent their own processes outside the application software. ]
"miscellaneous notes" must be treated like a log, with an unlimited number of notes -- date-stamped -- identified by user -- appended to the Order.
If you simply partition the design to put notes in a separate table, you solve your concurrency issues.

Designing a database for a user/points system? (in Django)

First of all, sorry if this isn't an appropriate question for StackOverflow. I've tried to make it as generalisable as possible.
I want to create a database (MySQL, site running Django) that has users, who can be allocated a certain number of points for various types of action - it's a collaborative game. My requirements are to obtain:
the number of points a user has
the user's ranking compared to all other users
and the overall leaderboard (i.e. all users ranked in order of points)
This is what I have so far, in my Django models.py file:
class SiteUser(models.Model):
name = models.CharField(max_length=250 )
email = models.EmailField(max_length=250 )
date_added = models.DateTimeField(auto_now_add=True)
def points_total(self):
points_added = PointsAdded.objects.filter(user=self)
points_total = 0
for point in points_added:
points_total += point.points
return points_total
class PointsAdded(models.Model):
user = models.ForeignKey('SiteUser')
action = models.ForeignKey('Action')
date_added = models.DateTimeField(auto_now_add=True)
def points(self):
points = Action.objects.filter(action=self.action)
return points
class Action(models.Model):
points = models.IntegerField()
action = models.CharField(max_length=36)
However it's rapidly becoming clear to me that it's actually quite complex (in Django query terms at least) to figure out the user's ranking and return the leaderboard of users. At least, I'm finding it tough. Is there a more elegant way to do something like this?
This question seems to suggest that I shouldn't even have a separate points table - what do people think? It feels more robust to have separate tables, but I don't have much experience of database design.
this is old, but I'm not sure exactly why you have 2 separate tables (Points Added & Action). It's late, so maybe my mind isn't ticking, but it seems like you just separated one table into 2 for some reason. It doesn't seem like you get any benefit out of it. It's not like there's a 1 to many relationship in it right?
So first of all, I would combine those two tables. Secondly, you are probably better off storing points_total into a value in your site_user table. This is what I think Demitry is trying to allude to, but didn't say explicitly. This way instead of doing this whole additional query (pulling everything a user has done in his history of the site is expensive) + looping action (going through it is even more expensive), you can just pull it as one field. It's denormalizing the data for a greater good.
Just be sure to update the value everytime you add in something that has points. You can use django's post_save signal to do that
It's a bit more difficult to have points saved in the same table, but it's totally worth it. You can do very simple ordering/filtering operations if you have computed points total on user model. And you can count totals only when something changes (not every time you want to show them). Just put some validation logic into post_save signals and make sure to cover this logic with tests and you're good.
p.s. denormalization on wiki.