Django: object creation in atomic transaction

Django: object creation in atomic transaction - django

I have a simple Task model:
class Task(models.Model):
name = models.CharField(max_length=255)
order = models.IntegerField(db_index=True)
And a simple task_create view:
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
Task.objects.filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
View shifts existing tasks that goes after newly created by + 1, then creates a new one.
And there are lots of users of this method, and I suppose something will go wrong one day with ordering because update and create definitely should be performed together.
So, I just want to be shure, will it be enough to avoid any data corruptions:
from django.db import transaction
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Big thx

1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
No - what you have currently looks fine and should work the way you want it to.
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Yes, it does matter. You need to return a response to the client regardless of whether your transaction was successful or not - so it definitely needs to be outside of the transaction block. If you did it inside the transaction, the client would get a 500 Server Error if the transaction failed.
However if the transaction fails, then you will not have a new task ID and cannot return that in your response. So you probably need to return different responses depending on whether the transaction succeeds, e.g,:
from django.db import IntegrityError, transaction
try:
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(
order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
except IntegrityError:
# Transaction failed - return a response notifying the client
return HttpResponse('Failed to create task, please try again!')
# If it succeeded, then return a normal response
return HttpResponse(new_task.id)

You could also try to change your model so you don't need to update so many other rows when inserting a new one.
For example, you could try something resembling a double-linked list.
(I used long explicit names for fields and variables here).
# models.py
class Task(models.Model):
name = models.CharField(max_length=255)
task_before_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='task_before_this_one_set')
task_after_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='tasks_after_this_one_set')
Your task at the top of the queue would be the one that has the field task_before_this_one set to null. So to get the first task of the queue:
# these will throw exceptions if there are many instances
first_task = Task.objects.get(task_before_this_one=None)
last_task = Task.objects.get(task_after_this_one=None)
When inserting a new instance, you just need to know after which task it should be placed (or, alternatively, before which task). This code should do that:
def task_create(request):
new_task = Task.objects.create(
name=request.POST.get('name'))
task_before = get_object_or_404(
pk=request.POST.get('task_before_the_new_one'))
task_after = task_before.task_after_this_one
# modify the 2 other tasks
task_before.task_after_this_one = new_task
task_before.save()
if task_after is not None:
# 'task_after' will be None if 'task_before' is the last one in the queue
task_after.task_before_this_one = new_task
task_after.save()
# update newly create task
new_task.task_before_this_one = task_before
new_task.task_after_this_one = task_after # this could be None
new_task.save()
return HttpResponse(new_task.pk)
This method only updates 2 other rows when inserting a new row. You might still want to wrap the whole method in a transaction if there is really high concurrency in your app, but this transaction will only lock up to 3 rows, not all the others as well.
This approach might be of use to you if you have a very long list of tasks.
EDIT: how to get an ordered list of tasks
This can not be done at the database level in a single query (as far as I know), but you could try this function:
def get_ordered_task_list():
# get the first task
aux_task = Task.objects.get(task_before_this_one=None)
task_list = []
while aux_task is not None:
task_list.append(aux_task)
aux_task = aux_task.task_after_this_one
return task_list
As long as you only have a few hundered tasks, this operation should not take that much time so that it impacts the response time. But you will have to try that out for yourself, in your environment, your database, your hardware.

Related

Django prevent creating objects in concurrency

models:
class CouponUsage(models.Model):
coupon = models.ForeignKey('Coupon', on_delete=models.CASCADE, related_name="usage")
date = models.DateTimeField(auto_now_add=True)
class Coupon(models.Model):
name = models.CharField(max_length=255)
capacity = models.IntegerField()
#property
def remaining(self):
usage = self.usage.all().count()
return self.capacity - usage
views:
def use_coupon(request):
coupon = Coupon.objects.get(condition)
if coupon.remaining > 0:
# do something
I don't know how to handle concurrency issues in the code above, I believe one possible bug is that when the if clause in the view is executing another CouponUsage object can be created...
how do I go about handling that?
how do I prevent CouponUsage objects from being created when inside the if clause in the view

One way of doing this would be to rely on the database integrity checks and transactions. Assuming your capacity must always be in the range [0, +infinity) you could change your Coupon model to use a PositiveIntegerField instead of an IntegerField:
class Coupon(models.Model):
name = models.CharField(max_length=255)
capacity = models.PositiveIntegerField()
Then you need to update your Coupon capacity every time a CouponUsage is created. You can override the save() method to reflect this change:
from django.db import models, transaction
class CouponUsage(models.Model):
coupon = models.ForeignKey('Coupon', on_delete=models.CASCADE, related_name="usage")
date = models.DateTimeField(auto_now_add=True)
#transaction.atomic()
def save(self, ...): # Arguments missing
if not self.pk: # This is an insert, you may want to raise an error otherwise
self.coupon.capacity = models.F('capacity') - 1 # The magic is here, this is executed at the database level so no problem with old in memory values
self.coupon.save()
super().save(...)
Now whenever a CuponUsage is created you update the capacity for the associated Coupon instance. The key here is that instead of reading the value from database into python's memory, updating and then saving, which could lead to inconsistent results, the update to capacity is made at the database level using an F expression. This guarantees that no two transactions use the same value.
Now, notice that by using a PositiveInteger field instead of an IntegerField the database will also guarantee that capacity cannot fall below 0. Therefore if you now try to create a CuponUsage instance such that the Cupon capacity would get a negative value, an exception will arise, thus preventing the creation of such CuponUsage.
You now need to take advantage of this in your code by doing something like the following:
def use_coupon(request):
coupon = Coupon.objects.get(condition)
try:
usage = CuponUsage.objects.create(coupon=coupon)
# Do whatever you want here, you already 'consumed' a coupon
except IntegrityError: # Check for the specific exception
# Sorry no capacity left
pass
If in the event of getting the coupon you need to do things that may fail, and in such a case you need to 'revert' the usage, you can enclose your whole use_coupon function inside a transaction.

Checking for overlapping TimeField ranges

I have this model:
class Task(models.Model):
class Meta:
unique_together = ("campaign_id", "task_start", "task_end", "task_day")
campaign_id = models.ForeignKey(Campaign, on_delete=models.DO_NOTHING)
playlist_id = models.ForeignKey(PlayList, on_delete=models.DO_NOTHING)
task_id = models.AutoField(primary_key=True, auto_created=True)
task_start = models.TimeField()
task_end = models.TimeField()
task_day = models.TextField()
I need to write a validation test that checks if a newly created task time range overlaps with an existing one in the database.
For example:
A task with and ID 1 already has a starting time at 5:00PM and ends at 5:15PM on a Saturday. A new task cannot be created between the first task's start and end time. Where should I write this test and what is the most efficent way to do this? I also use DjangoRestFramework Serializers.

When you receive the form data from the user, you can:
Check the fields are consistent: user task_start < user task_end, and warn the user if not.
Query (SELECT) the database to retrieve all existing tasks which intercept the user time,
Order the records by task_start (ORDER BY),
Select only records which validate your criterion, a.k.a.:
task_start <= user task_start <= task_end, or,
task_start <= user task_end <= task_end.
warn the user if at least one record is found.
Everything is OK:
Construct a Task instance,
Store it in database.
Return success.
Implementation details:
task_start and task_end could be indexed in your database to improve selection time.
I saw that you also have a task_day field (which is a TEXT).
You should really consider using UTC DATETIME fields instead of TEXT, because you need to compare date AND time (and not only time): consider a task which starts at 23:30 and finish at 00:45 the day after…

This is how I solved it. It's not optimal by far, but I'm limited to python 2.7 and Django 1.11 and I'm also a beginner.
def validate(self, data):
errors = {}
task_start = data.get('task_start')
task_end = data.get('task_end')
time_filter = Q(task_start__range=[task_start, task_end])
| Q(task_end__range=[task_start, task_end])
filter_check = Task.objects.filter(time_filter).exists()
if task_start > task_end:
errors['error'] = u'End time cannot be earlier than start time!'
raise serializers.ValidationError(errors)
elif filter_check:
errors['errors'] = u'Overlapping tasks'
raise serializers.ValidationError(errors)
else:
pass
return data

Strategies for returning the values at the bottom of a hierarchy of classes

I'm working with the Gerrit REST API, and have been tasked with regularly parsing all commits ever made and putting them into an AWS database for analytics. To get to the commits, I first need to get a list of all projects, then in those projects get a list of all changes and revisions for each change, and then finally I can use that information to get individual commits.
To solve this problem I created a hierarchy of classes where essentially a Project contains 1 to many Change's which contains 1 to many Commit's. Each Commit will have a write_data method so I can easily write them out to a flat file to upload to S3. I'm looking for some strategies where I can most efficiently parse through and create a "cascade" effect where I kick off an individual thread for each project, which will in turn populate each change, which will in turn populate a list of commits, and then each thread will return a list that contains the commit objects for easy writing. Here are the three classes:
class Project(object):
def __init__(self, name):
self.name = name
self.changes = []
def add_changes(self, changes_list):
for change in changes_list:
self.changes.append(Change(
project=self.name,
change_id=change['change_id'],
_id=change['id'],
revision_list=change['revisions']
))
def return_results(self, ger_obj, start=0):
while True:
endpoint = (r'/changes/?q=project:{project}&o=ALL_REVISIONS&'
r'S={num}'.format(
project=self.name,
num=start
))
print 'Endpoint: {}'.format(endpoint)
try:
changes = ger_obj.get(endpoint)
self.add_changes(changes_list=changes)
except HTTPError:
break
start += 500
try:
if not changes[-1].get('_more_changes'):
break
except IndexError:
break
class Change(object):
def __init__(self, project, change_id, _id, revision_list):
self.project = project
self.change_id = change_id
logging.info('Change id is', self.change_id)
self.id = _id
self.commits = [Commit(rid, change_id)
for rid in revision_list.keys()]
def _return_commits(self):
return self.commits
class Commit(object):
def __init__(self, rev_id, change_id):
self.rev_id = rev_id
self.change_id = change_id
def return_results(self, ger_obj):
endpoint = (r'/changes/{c_id}/revisions/{r_id}/commit'.format(
c_id=self.change_id,
r_id=self.rev_id
))
logging.info('Endpoint: {}'.format(endpoint))
self.data = ger_obj.get(endpoint)
def write_data(self, writer):
self.data['message'].encode('string_escape').replace('|', '[pipe]')
writer.writerow(self.data)
I was thinking that perhaps the best time to aggregate all the commits was to have each Change object return it's commits after populating them, then I can just keep adding to them as threads finish. Or I was toying with the idea of not even storing the Change's in each Project, and perhaps returning a generator object from the add_changes method instead. What would be the most efficient and scalable design pattern?

Model integrity check in django

I have a model named Entry that has the following fields
from django.contrib.auth.models import User
class Entry(models.Model):
start = models.DateTimeField()
end = models.DateTimeField()
creator = models.ForeignKey(User)
canceled = models.BooleanField(default=False)
When I create a new entry I don't want to be created if the creator has allready a created event between the same start and end dates. So my Idea was when user posts data from the creation form
if request.method == 'POST':
entryform = EntryAddForm(request.POST)
if entryform.is_valid():
entry = entryform.save(commit=False)
entry.creator = request.user
#check if an entry exists between start and end
if Entry.objects.get(creator=entry.creator, start__gte=entry.start, end__lte=entry.end, canceled=False):
#Add to message framework that an entry allready exists there
return redirect_to('create_form_page')
else:
#go on with creating the entry
I was thinking that maybe a unique field and checking properly db integrity would be better but it is the canceled field that's troubling me in to how to choose the unique fields. Do you think there will be somethng wrong with my method? I mean does it make sure that no entry will be set between start and end date for user is he has allready saved one? Do you think its better this code do go to the pre-save? The db will start empty, so after entering one entry, everything will take its course (assuming that...not really sure...)

You need to use Q for complex queries.
from django.db.models import Q
_exists = Entry.objects.filter(Q(
Q(Q(start__gte=entry.start) & Q(start__lte=entry.end)) |
Q(Q(end__gte=entry.start) & Q(end__lte=entry.end)) |
Q(Q(start__lte=enrty.start) & Q(end__gte=entry.end))
))
if _exists:
"There is existing event"
else:
"You can create the event"
Since I do not test this, I use Q objects whereever I thought would be necessary.
Using this query, you will not need any unique check.

django - issue with get_or_create - results in two objects created in DB

I have a table in my postgresql db holding a state of an hour record. For each month, project and user I need exactly one state.
I'm using the get_or_create method to either create a "state" or to retrieve it if it already exists.
HourRecordState.objects.get_or_create(user=request.user, project=project, month=month, year=year, defaults={'state': 0, 'modified_by': request.user})
After running this for about two years without problems I stumbled over the problem that I have one HourRecordState twice in my database. Now each time the get_or_create method is called it throws the following error:
MultipleObjectsReturned: get() returned more than one HourRecordState
-- it returned 2
I'm wondering how it could happen that I have two identical records in my DB. Interestingly they have been created at the same time (seconds, not checked the milliseconds).
I checked my code and I have in the whole project only one get_or_create method to create this object. No other create methods in the code.
Would be great to get a hint..
Update:
The objects have been created at almost the same time:
First object: 2011-10-04 11:04:35.491114+02
Second object: 2011-10-04 11:04:35.540002+02
And the code:
try:
project_id_param = int(project_id_param)
project = get_object_or_404(Project.objects, pk=project_id_param)
#check activity status of project
try:
is_active_param = project.projectclassification.is_active
except:
is_active_param = 0
if is_active_param == True:
is_active_param = 1
else:
is_active_param = 0
#show the jqgrid table and the hour record state form
sub_show_hr_flag = True
if project is not None:
hour_record_state, created = HourRecordState.objects.get_or_create(user=request.user, project=project, month=month, year=year, defaults={'state': 0, 'modified_by': request.user})
state = hour_record_state.state
manage_hour_record_state_form = ManageHourRecordsStateForm(instance=hour_record_state)
if not project_id_param is False:
work_place_query= ProjectWorkPlace.objects.filter(project=project_id_param, is_active=True)
else:
work_place_query = ProjectWorkPlace.objects.none()
work_place_dropdown = JQGridDropdownSerializer().get_dropdown_value_list_workplace(work_place_query)
except Exception, e:
project_id_param = False
project = None
request.user.message_set.create(message='Chosen project could not be found.')
return HttpResponseRedirect(reverse('my_projects'))

Well this is not an exact answer to your question, but I think you should change your database scheme and switch to using UNIQUE constraints that help you to maintain data integrity, as the uniqueness will be enforced on database level.
If you state that for every month, user and project you need exactly one state, your model should look something like this (using the unique_together constraint):
class HourRecordState(models.Model):
user = models.ForeignKey(User)
project = models.ForeignKey(Project)
month = models.IntegerField()
year = models.IntegerField()
# other fields...
class Meta:
unique_together = ((user, project, month, year),)
Because get_or_create is handled by django as a get and create multiple processes seem to be able under certain condition to create the same object twice, but if you use unique_together an exception will be thrown if the attempt is made....

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django: object creation in atomic transaction - django

Related

Django prevent creating objects in concurrency

Checking for overlapping TimeField ranges

Strategies for returning the values at the bottom of a hierarchy of classes

Model integrity check in django

django - issue with get_or_create - results in two objects created in DB

Categories

Resources