Effecient Bulk Update of Model Records in Django - django

I'm building a Django app that will periodically take information from an external source, and use it to update model objects.
What I want to to be able to do is create a QuerySet which has all the objects which might match the final list. Then check which model objects need to be created, updated, and deleted. And then (ideally) perform the update in the fewest number of transactions. And without performing any unnecessary DB operations.
Using create_or_update gets me most of the way to what I want to do.
jobs = get_current_jobs(host, user)
for host, user, name, defaults in jobs:
obj, _ = Job.upate_or_create(host=host, user=user, name=name, defaults=defaults)
The problem with this approach is that it doesn't delete anything that no longer exists.
I could just delete everything up front, or do something dumb like
to_delete = set(Job.objects.filter(host=host, user=user)) - set(current)
(Which is an option) but I feel like there must already be an elegant solution that doesn't require either deleting everything, or reading everything into memory.

You should use Redis for storage and use this python package in your code. For example:
import redis
import requests
pool = redis.StrictRedis('localhost')
time_in_seconds = 3600 # the time period you want to keep your data
response = requests.get("url_to_ext_source")
pool.set("api_response", response.json(), ex=time_in_seconds)

Related

Sequentially execute Multiple R/W queries in same django views function

I have read and write queries in my single Django view function. As in below code:
def multi_query_function(request):
model_data = MyModel.objects.all() #first read command
...(do something)...
new_data = MyModel(
id=1234,
first_property='random value',
second_property='another value'
)
new_data.save() #second write command
return render(request, index.html)
I need these queries in the function to be executed consecutively. For example, if multiple users use this function at the same time, it should execute this function for both users one by one. The 'read' of one user should only be allowed if the previous user has completed both of his/her 'read and write'. The queries of both users should never be intermingled.
Should I use table locking feature of my PostgreSQL DB or is there any other well managed way?
Yep, using your database's locks are a good way to do this.
https://github.com/Xof/django-pglocks looks like a good library to give you a lock context manager.

Multi-DB Transactions

Django Version 1.10.5 with Postgres 9.6.1
For the last year I've been working in a multi-schema default database environment. However things are beginning to grow to the point I've decided to split the single database into 3 databases.
I've got things working with a master/slave router for all 3 databases.
I am not using the 'default' database key. Instead I have 'db1', 'db2', and 'db3'
The part I am confused about is with transactions in this multi-database environment.
In this example it fails as expected. Caused of course by not using #transaction.atomic(using='db1') which is clear to me.
#transaction.atomic()
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
result.save()
However I have this strange example, simply because I'm trying to understand... I would have expected this to fail but it does not:
#transaction.atomic(using='db1')
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
with transaction.atomic(using='db2'):
result.save()
The model Passport does not exist in DB2 models at all.
My router is setup so that all writes go to each respected DB.
So what is the purpose of setting the using='db1' in the atomic transaction? I've looked at the source and I see it defaults to default when not "using".
In the above example I even made another transaction inside of the initial transaction but this time using='db2' where the model doesn't even exist. I figured that would have failed, but it didn't and the data was written to the proper database.
I bring this up because there will be situations where I need to interact with all 3 databases and if a single problem occurs when writing to all 3 databases, all 3 need to be rolled back or if on success of everything, then committed of course.
Perhaps someone can help break this down for me so I can understand?
You're interpreting transaction.atomic(using='X') to mean: run the following database commands on X, inside a transaction.
In fact, it just means: open a transaction on database X, and then either commit it or roll it back at the end of the block.
Or, as the documentation puts it:
Under the hood, Django’s transaction management code:
opens a transaction when entering the outermost atomic block;
commits or rolls back the transaction when exiting the outermost block.
The question of which database to use for a given command is determined by your router, not the using clause. So your transaction.atomic(using='db2') block is pointless (it will simply open a transaction on db2 and then close it), but not an error.

FK validation within Django

Good afternoon,
I have my django server running with a REST api on top to serve my mobile devices. Now, at some point, the mobile device will communicate with Django.
Let's say the device is asking Django to add an object in the database, and within that object, I need to set a FK like this:
objectA = ObjectA.objects.create(title=title,
category_id = c_id, order = order, equipment_id = e_id,
info_maintenance = info_m, info_security = info_s,
info_general = info_g, alphabetical_notation = alphabetical_notation,
allow_comments = allow_comments,
added_by_id = user_id,
last_modified_by_id = user_id)
If the e_id and c_id is received from my mobile devices, should I check before calling this creation if they actually still exists in the DB? That is two extra queries... but if they can avoid any problems, I don't mind!
Thanks a lot!
It think that Django creates constraint on Foreign Key by default ( might depend on database though ). This means that if your foreign keys point to something that does not exist, then saving will fail ( resulting in Exception on Python side ).
You can reduce it to a single query (it should be a single query at least, warning I haven't tested the code):
if MyObject.objects.filter(id__in=[e_id, c_id]).distinct().count() == 2:
# create the object
ObjectA.objects.create(...)
else:
# objects corresponding e_id and c_id do not exist, do NOT create ObjectA
You should always validate any information that's coming from a user or that can be altered by a determined user. It wouldn't be difficult for someone to sniff the traffic and start constructing their own REST requests to your server. Always clean and validate external data that's being added to the system.

Bulk create model objects in django

I have a lot of objects to save in database, and so I want to create Model instances with that.
With django, I can create all the models instances, with MyModel(data), and then I want to save them all.
Currently, I have something like that:
for item in items:
object = MyModel(name=item.name)
object.save()
I'm wondering if I can save a list of objects directly, eg:
objects = []
for item in items:
objects.append(MyModel(name=item.name))
objects.save_all()
How to save all the objects in one transaction?
as of the django development, there exists bulk_create as an object manager method which takes as input an array of objects created using the class constructor. check out django docs
Use bulk_create() method. It's standard in Django now.
Example:
Entry.objects.bulk_create([
Entry(headline="Django 1.0 Released"),
Entry(headline="Django 1.1 Announced"),
Entry(headline="Breaking: Django is awesome")
])
worked for me to use manual transaction handling for the loop(postgres 9.1):
from django.db import transaction
with transaction.atomic():
for item in items:
MyModel.objects.create(name=item.name)
in fact it's not the same, as 'native' database bulk insert, but it allows you to avoid/descrease transport/orms operations/sql query analyse costs
name = request.data.get('name')
period = request.data.get('period')
email = request.data.get('email')
prefix = request.data.get('prefix')
bulk_number = int(request.data.get('bulk_number'))
bulk_list = list()
for _ in range(bulk_number):
code = code_prefix + uuid.uuid4().hex.upper()
bulk_list.append(
DjangoModel(name=name, code=code, period=period, user=email))
bulk_msj = DjangoModel.objects.bulk_create(bulk_list)
Here is how to bulk-create entities from column-separated file, leaving aside all unquoting and un-escaping routines:
SomeModel(Model):
#classmethod
def from_file(model, file_obj, headers, delimiter):
model.objects.bulk_create([
model(**dict(zip(headers, line.split(delimiter))))
for line in file_obj],
batch_size=None)
Using create will cause one query per new item. If you want to reduce the number of INSERT queries, you'll need to use something else.
I've had some success using the Bulk Insert snippet, even though the snippet is quite old.
Perhaps there are some changes required to get it working again.
http://djangosnippets.org/snippets/446/
Check out this blog post on the bulkops module.
On my django 1.3 app, I have experienced significant speedup.
bulk_create() method is one of the ways to insert multiple records in the database table. How the bulk_create()
**
Event.objects.bulk_create([
Event(event_name="Event WF -001",event_type = "sensor_value"),
Entry(event_name="Event WT -002", event_type = "geozone"),
Entry(event_name="Event WD -001", event_type = "outage") ])
**
for a single line implementation, you can use a lambda expression in a map
map(lambda x:MyModel.objects.get_or_create(name=x), items)
Here, lambda matches each item in items list to x and create a Database record if necessary.
Lambda Documentation
The easiest way is to use the create Manager method, which creates and saves the object in a single step.
for item in items:
MyModel.objects.create(name=item.name)

How do I deal with this race condition in django?

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.
In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".
# Model:
class MyObj(models.Model):
thing = models.ForeignKey(Thing)
owner = models.ForeignKey(User)
state = models.BooleanField()
class Meta:
unique_together = (('thing', 'owner'),)
# Update or create myobj
#transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
try:
myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)
except IntegrityError:
myobj = MyObj.objects.get(owner=user,thing=thing)
# Will sometimes throw "DoesNotExist: MyObj matching query does not exist"
myobj.state = state
myobj.save()
I use an innodb mysql database on ubuntu.
How do I safely deal with this problem?
This could be an off-shoot of the same problem as here:
Why doesn't this loop display an updated object count every five seconds?
Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.
This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:
first thread tries to get it - but it doesn't yet exist,
so, the thread tries to create it, but before the object is created...
...second thread tries to get it - and this obviously fails
now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
subsequently, first thread creates its object and returns it gracefully, but...
...second thread cannot create anything as it would violate unique constraint
what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table
So, if you want to safely get_or_create anything, try something like this:
#transaction.commit_on_success
def my_get_or_create(...):
try:
obj = MyObj.objects.create(...)
except IntegrityError:
transaction.commit()
obj = MyObj.objects.get(...)
return obj
Edited on 27/05/2010
There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.
Edited on 22/01/2012
Here are some good blog posts (not mine) about MySQL and Django, related to this question:
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html
Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.
One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.
Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.
To avoid race conditions in Django + MySQL
under default circumstances:
REPEATABLE_READ in the Mysql
READ_COMMITTED in the Django
you can use this:
with transaction.atomic():
instance = YourModel.objects.select_for_update().get(id=42)
instance.evolve()
instance.save()
The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.
Then together with get_or_create:
def select_for_update_or_create(...):
instance = YourModel.objects.filter(
...
).select_for_update().first()
if order is None:
instnace = YouModel.objects.create(...)
return instance
The function must be inside transaction block, otherwise, you will get from Django:
TransactionManagementError: select_for_update cannot be used outside of a transaction
Also sometimes it's good to use refresh_from_db()
In case like:
instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something
you'd like to see:
instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db() # 3
instance.attr = response.something
and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.