Django-ORM: Check whether multiple items are in DB while minimizing calls - django

Assume that from an external API call, we get the following response:
resp = ['123', '67283', '99829', '786232']
These are external_id fields for our objects, defined in our Article model. Some of which may already exist in database, while others don't.
Before returning a response, we need to check whether each external_id corresponds to a record in our database, and if not, we need to create it and fetch additional info from another, third, source.
What is the most efficient way to do this? Right now I can't think of something better than:
for external_id in resp:
if not Article.objects.filter(external_id=external_id).exists():
# item doesn't exist, go fetch more data and create object
else:
# already exists, do something else
But there must be a better way..?

You can use sets for this task. Following code will issue only one database call:
expected_ids = set(int(pk) for pk in resp)
exist_ids = set(Article.objects.filter(external_id__in=resp)
.values_list('external_id', flat=True))
not_exist_ids = list(expected_ids - exist_ids)

Related

Django Rest Framework: Disable save in update

It is my first question here, after reading the similar questions I did not find what I need, thanks for your help.
I am creating a fairly simple API but I want to use best practices at the security level.
Requirement: There is a table in SQL Server with +5 million records that I should ONLY allow READ (all fields) and UPDATE (one field). This is so that a data scientist consumes data from this table and through a predictive model (I think) can assign a value to each record.
For this I mainly need 2 things:
That only one field is updated despite sending all the fields of the table in the Json (I think I have achieved it with my serializer).
And, where I have problems, is in disabling the creation of new records when updating one that does not exist.
I am using an UpdateAPIView to allow trying to allow a bulk update using a json like this (subrrogate_key is in my table and I use lookup_field to:
[
{
"subrrogate_key": "A1",
"class": "A"
},
{
"subrrogate_key": "A2",
"class": "B"
},
{
"subrrogate_key": "A3",
"class": "C"
},
]
When using the partial_update methods use update and this perform_update and this finally calls save and the default operation is to insert a new record if the primary key (or the one specified in lookup_field) is not found.
If I overwrite them, how can I make a new record not be inserted, and only update the field if it exists?
I tried:
Model.objects.filter (subrrogate_key = ['subrrogate_key']). Update (class = ['class])
Model.objects.update_or_create (...)
They work fine if all the keys in the Json exist, because if a new one comes they will insert (I don't want this).
P.S. I use a translator, sorry.
perform_update will create a new record if you passed a serializer that doesn't have an instance. Depending on how you wrote your view, you can simply check if there is an instance in the serializer before calling save in perform_update to prevent creating a new record:
def perform_update(self, serializer):
if not serializer.instance:
return
serializer.save()
Django implements that feature through the use of either force_update or update_fields during save().
https://docs.djangoproject.com/en/3.2/ref/models/instances/#forcing-an-insert-or-update
https://docs.djangoproject.com/en/3.2/ref/models/instances/#specifying-which-fields-to-save
https://docs.djangoproject.com/en/3.2/ref/models/instances/#saving-objects
In some rare circumstances, it’s necessary to be able to force the
save() method to perform an SQL INSERT and not fall back to doing an
UPDATE. Or vice-versa: update, if possible, but not insert a new row.
In these cases you can pass the force_insert=True or force_update=True
parameters to the save() method.
model_obj.save(force_update=True)
or
model_obj.save(update_fields=['field1', 'field2'])

Effecient Bulk Update of Model Records in Django

I'm building a Django app that will periodically take information from an external source, and use it to update model objects.
What I want to to be able to do is create a QuerySet which has all the objects which might match the final list. Then check which model objects need to be created, updated, and deleted. And then (ideally) perform the update in the fewest number of transactions. And without performing any unnecessary DB operations.
Using create_or_update gets me most of the way to what I want to do.
jobs = get_current_jobs(host, user)
for host, user, name, defaults in jobs:
obj, _ = Job.upate_or_create(host=host, user=user, name=name, defaults=defaults)
The problem with this approach is that it doesn't delete anything that no longer exists.
I could just delete everything up front, or do something dumb like
to_delete = set(Job.objects.filter(host=host, user=user)) - set(current)
(Which is an option) but I feel like there must already be an elegant solution that doesn't require either deleting everything, or reading everything into memory.
You should use Redis for storage and use this python package in your code. For example:
import redis
import requests
pool = redis.StrictRedis('localhost')
time_in_seconds = 3600 # the time period you want to keep your data
response = requests.get("url_to_ext_source")
pool.set("api_response", response.json(), ex=time_in_seconds)

FK validation within Django

Good afternoon,
I have my django server running with a REST api on top to serve my mobile devices. Now, at some point, the mobile device will communicate with Django.
Let's say the device is asking Django to add an object in the database, and within that object, I need to set a FK like this:
objectA = ObjectA.objects.create(title=title,
category_id = c_id, order = order, equipment_id = e_id,
info_maintenance = info_m, info_security = info_s,
info_general = info_g, alphabetical_notation = alphabetical_notation,
allow_comments = allow_comments,
added_by_id = user_id,
last_modified_by_id = user_id)
If the e_id and c_id is received from my mobile devices, should I check before calling this creation if they actually still exists in the DB? That is two extra queries... but if they can avoid any problems, I don't mind!
Thanks a lot!
It think that Django creates constraint on Foreign Key by default ( might depend on database though ). This means that if your foreign keys point to something that does not exist, then saving will fail ( resulting in Exception on Python side ).
You can reduce it to a single query (it should be a single query at least, warning I haven't tested the code):
if MyObject.objects.filter(id__in=[e_id, c_id]).distinct().count() == 2:
# create the object
ObjectA.objects.create(...)
else:
# objects corresponding e_id and c_id do not exist, do NOT create ObjectA
You should always validate any information that's coming from a user or that can be altered by a determined user. It wouldn't be difficult for someone to sniff the traffic and start constructing their own REST requests to your server. Always clean and validate external data that's being added to the system.

Assigning values to a query result already set up with a foreign key

I have a database of exhibition listings related by foreign key to a database of venues where they take place. Django templates access the venue information in the query results through listing.venue.name, listing.venue.url, and so on.
However, some exhibitions take place in temporary venues, and that information is stored in the same database, in what would be listing.temp_venue_url and such. Because it seems wasteful and sad to put conditionals all over the templates, I want to move the info for temporary venues to where the templates are expecting info for regular venues. This didn't work:
def transfer_temp_values(listings):
for listing in listings:
if listing.temp_venue:
listing.venue = Venue
listing.venue.name = listing.temp_venue
listing.venue.url = listing.temp_venue_url
listing.venue.state = listing.temp_venue_state
listing.venue.location = listing.temp_venue_location
The error surprised me:
ValueError at /[...]/
Cannot assign "<class 'myproject.gsa.models.Venue'>": "Exhibition.venue" must be a "Venue" instance.
I rather thought it was. How do I go about accomplishing this?
The error message is because you have assigned the class Venue to the listing, rather than an instance of it. You need to call the class to get an instance:
listing.venue = Venue()

Bulk create model objects in django

I have a lot of objects to save in database, and so I want to create Model instances with that.
With django, I can create all the models instances, with MyModel(data), and then I want to save them all.
Currently, I have something like that:
for item in items:
object = MyModel(name=item.name)
object.save()
I'm wondering if I can save a list of objects directly, eg:
objects = []
for item in items:
objects.append(MyModel(name=item.name))
objects.save_all()
How to save all the objects in one transaction?
as of the django development, there exists bulk_create as an object manager method which takes as input an array of objects created using the class constructor. check out django docs
Use bulk_create() method. It's standard in Django now.
Example:
Entry.objects.bulk_create([
Entry(headline="Django 1.0 Released"),
Entry(headline="Django 1.1 Announced"),
Entry(headline="Breaking: Django is awesome")
])
worked for me to use manual transaction handling for the loop(postgres 9.1):
from django.db import transaction
with transaction.atomic():
for item in items:
MyModel.objects.create(name=item.name)
in fact it's not the same, as 'native' database bulk insert, but it allows you to avoid/descrease transport/orms operations/sql query analyse costs
name = request.data.get('name')
period = request.data.get('period')
email = request.data.get('email')
prefix = request.data.get('prefix')
bulk_number = int(request.data.get('bulk_number'))
bulk_list = list()
for _ in range(bulk_number):
code = code_prefix + uuid.uuid4().hex.upper()
bulk_list.append(
DjangoModel(name=name, code=code, period=period, user=email))
bulk_msj = DjangoModel.objects.bulk_create(bulk_list)
Here is how to bulk-create entities from column-separated file, leaving aside all unquoting and un-escaping routines:
SomeModel(Model):
#classmethod
def from_file(model, file_obj, headers, delimiter):
model.objects.bulk_create([
model(**dict(zip(headers, line.split(delimiter))))
for line in file_obj],
batch_size=None)
Using create will cause one query per new item. If you want to reduce the number of INSERT queries, you'll need to use something else.
I've had some success using the Bulk Insert snippet, even though the snippet is quite old.
Perhaps there are some changes required to get it working again.
http://djangosnippets.org/snippets/446/
Check out this blog post on the bulkops module.
On my django 1.3 app, I have experienced significant speedup.
bulk_create() method is one of the ways to insert multiple records in the database table. How the bulk_create()
**
Event.objects.bulk_create([
Event(event_name="Event WF -001",event_type = "sensor_value"),
Entry(event_name="Event WT -002", event_type = "geozone"),
Entry(event_name="Event WD -001", event_type = "outage") ])
**
for a single line implementation, you can use a lambda expression in a map
map(lambda x:MyModel.objects.get_or_create(name=x), items)
Here, lambda matches each item in items list to x and create a Database record if necessary.
Lambda Documentation
The easiest way is to use the create Manager method, which creates and saves the object in a single step.
for item in items:
MyModel.objects.create(name=item.name)