Sequentially execute Multiple R/W queries in same django views function - django

I have read and write queries in my single Django view function. As in below code:
def multi_query_function(request):
model_data = MyModel.objects.all() #first read command
...(do something)...
new_data = MyModel(
id=1234,
first_property='random value',
second_property='another value'
)
new_data.save() #second write command
return render(request, index.html)
I need these queries in the function to be executed consecutively. For example, if multiple users use this function at the same time, it should execute this function for both users one by one. The 'read' of one user should only be allowed if the previous user has completed both of his/her 'read and write'. The queries of both users should never be intermingled.
Should I use table locking feature of my PostgreSQL DB or is there any other well managed way?

Yep, using your database's locks are a good way to do this.
https://github.com/Xof/django-pglocks looks like a good library to give you a lock context manager.

Related

Django function execution

In views, I have a function defined which is executed when the user submits the form online. After the form submission there are some database transactions that I perform and then based on the existing data in the database API's are triggered:
triggerapi():
execute API to send Email to the user and the administrator about
the submitted form
def databasetransactions():
check the data in the submitted form with the data in DB
if the last data submitted by the user is before 10 mins or more:
triggerapi()
def formsubmitted(request):
save the user input in variables
Databasetransactions()
save the data from the submitted form in the DB
In the above case, the user clicks on submit button 2 times in less than 5 milliseond duration. So 2 parallel data starts to process and both trigger Email which is not the desired behavior.
Is there a way to avoid this ? So that for a user session, the application should only accept the data once all the older data processing is completed ?
Since we are talking in pseudo-code, one way could be to use a singleton pattern for triggerapi() and return Not Allowed in case it is already istantiated.
There are multiple ways to solve this issue.
One of them would be to create a new session variable
request.session['activetransaction'] = True
This would however require you to pass request, unless it is already passed and we got a changed code portion. You can also add an instance/ class flag for it in the same way and check with it.
Another way, which might work if you need those submissions handled after the previous one, you can always add a while request.session['activetransaction']: and do the handling afterwards.
def formsubmitted(request):
if 'activetransaction' not in request.session or not request.session['activetransaction']:
request.session['activetransaction'] = True
# save the user input in variables
Databasetransactions()
# save the data from the submitted form in the DB
request.session['activetransaction'] = False
...

Effecient Bulk Update of Model Records in Django

I'm building a Django app that will periodically take information from an external source, and use it to update model objects.
What I want to to be able to do is create a QuerySet which has all the objects which might match the final list. Then check which model objects need to be created, updated, and deleted. And then (ideally) perform the update in the fewest number of transactions. And without performing any unnecessary DB operations.
Using create_or_update gets me most of the way to what I want to do.
jobs = get_current_jobs(host, user)
for host, user, name, defaults in jobs:
obj, _ = Job.upate_or_create(host=host, user=user, name=name, defaults=defaults)
The problem with this approach is that it doesn't delete anything that no longer exists.
I could just delete everything up front, or do something dumb like
to_delete = set(Job.objects.filter(host=host, user=user)) - set(current)
(Which is an option) but I feel like there must already be an elegant solution that doesn't require either deleting everything, or reading everything into memory.
You should use Redis for storage and use this python package in your code. For example:
import redis
import requests
pool = redis.StrictRedis('localhost')
time_in_seconds = 3600 # the time period you want to keep your data
response = requests.get("url_to_ext_source")
pool.set("api_response", response.json(), ex=time_in_seconds)

Allowing users to only view data related to them in Apache Superset

I have some information related to different vendors in my database and I want to allow each registered vendor (representative person) to view slices/dashboards which contains only data related to them.
One possible solution could be to create separate views for each vendor as well as separate roles for each vendor. But it feels like a bad idea if you have 100+ vendors (as is my case); and it's not a flexible or scalable solution.
Is there some way to automatically filter a given view for each user? For example, we have a "general profit by product" bar chart, and user X can see only products of vendor X
What you're looking for is multi-tenancy support, and this is not currently supported out-of-the-box in Superset.
There is however an open PR for one possible solution: https://github.com/apache/incubator-superset/pull/3729
One option could be to re-use and/or adapt that code for your use-case.
Another option might be to look into JINJA_CONTEXT_ADDONS [https://github.com/apache/incubator-superset/blob/master/docs/installation.rst#sql-lab] and see whether you might be able to pass additional context to your query (e.g. your vendor_id) and restrict the scope of your query using that parameter.
Superset config has the below two configurations(DB_CONNECTION_MUTATOR, SQL_QUERY_MUTATOR), which can allow for multi-tenancy to an extent.
A callable that allows altering the database conneciton URL and params
on the fly, at runtime. This allows for things like impersonation or
arbitrary logic. For instance you can wire different users to
use different connection parameters, or pass their email address as the
username. The function receives the connection uri object, connection
params, the username, and returns the mutated uri and params objects.
Example:
def DB_CONNECTION_MUTATOR(uri, params, username, security_manager, source):
user = security_manager.find_user(username=username)
if user and user.email:
uri.username = user.email
return uri, params
Note that the returned uri and params are passed directly to sqlalchemy's
as such create_engine(url, **params)
DB_CONNECTION_MUTATOR = None
A function that intercepts the SQL to be executed and can alter it.
The use case is can be around adding some sort of comment header
with information such as the username and worker node information
def SQL_QUERY_MUTATOR(sql, username, security_manager):
dttm = datetime.now().isoformat()
return f"-- [SQL LAB] {username} {dttm}\n{sql}"
SQL_QUERY_MUTATOR = None
One easy way of solving this problem is by using pre-defined JINJA parameters.
Two parameters that can be used are '{{current_username() }}' and {{current_user_id() }}
First you need to ensure that you can use JINJA templates -
In superset_config.py add the following
FEATURE_FLAGS = {
"ENABLE_TEMPLATE_PROCESSING": True,
}
Restart
Now if you go to the SQL LAB and type the following -
SELECT '{{ current_username() }}',{{ current_user_id() }};
You should get an output
?column?
?column?__1
PayalC
5
Now all you have to do is append one of the two following sql snippet in all your queries.
select ........ from ...... where ...... vendorid={{ current_user_id() }}
select ........ from ...... where ...... vendorname='{{ current_username() }}'
vendorid={{ current_user_id() }} and/or
vendorname='{{ current_username() }}' will restrict the user to view only her data.
You could also make it more flexible by creating a table which has a mapping of user to vendorid. That table can be your added to all the queries and you could map multiple vendors to a single user or even all vendors to a single user for a super admin.

python code for directory api to batch retrieve all users from domain

Currently I have a method that retrieves all ~119,000 gmail accounts and writes them to a csv file using python code below and the enabled admin.sdk + auth 2.0:
def get_accounts(self):
students = []
page_token = None
params = {'customer': 'my_customer'}
while True:
try:
if page_token:
params['pageToken'] = page_token
current_page = self.dir_api.users().list(**params).execute()
students.extend(current_page['users'])
# write each page of data to a file
csv_file = CSVWriter(students, self.output_file)
csv_file.write_file()
# clear the list for the next page of data
del students[:]
page_token = current_page.get('nextPageToken')
if not page_token:
break
except errors.HttpError as error:
break
I would like to retrieve all 119,000 as a lump sum, that is, without having to loop or as a batch call. Is this possible and if so, can you provide example python code? I have run into communication issues and have to rerun the process multiple times to obtain the ~119,000 accts successfully (takes about 10 minutes to download). Would like to minimize communication errors. Please advise if better method exists or non-looping method also is possible.
There's no way to do this as a batch because you need to know each pageToken and those are only given as the page is retrieved. However, you can increase your performance somewhat by getting larger pages:
params = {'customer': 'my_customer', 'maxResults': 500}
since the default page size when maxResults is not set is 100, adding maxResults: 500 will reduce the number of API calls by an order of 5. While each call may take slightly longer, you should notice performance increases because you're making far fewer API calls and HTTP round trips.
You should also look at using the fields parameter to only specify user attributes you need to read in the list. That way you're not wasting time and bandwidth retrieving details about your users that your app never uses. Try something like:
my_fields = 'nextPageToken,users(primaryEmail,name,suspended)'
params = {
'customer': 'my_customer',
maxResults': 500,
fields: my_fields
}
Last of all, if your app retrieves the list of users fairly frequently, turning on caching may help.

Bulk create model objects in django

I have a lot of objects to save in database, and so I want to create Model instances with that.
With django, I can create all the models instances, with MyModel(data), and then I want to save them all.
Currently, I have something like that:
for item in items:
object = MyModel(name=item.name)
object.save()
I'm wondering if I can save a list of objects directly, eg:
objects = []
for item in items:
objects.append(MyModel(name=item.name))
objects.save_all()
How to save all the objects in one transaction?
as of the django development, there exists bulk_create as an object manager method which takes as input an array of objects created using the class constructor. check out django docs
Use bulk_create() method. It's standard in Django now.
Example:
Entry.objects.bulk_create([
Entry(headline="Django 1.0 Released"),
Entry(headline="Django 1.1 Announced"),
Entry(headline="Breaking: Django is awesome")
])
worked for me to use manual transaction handling for the loop(postgres 9.1):
from django.db import transaction
with transaction.atomic():
for item in items:
MyModel.objects.create(name=item.name)
in fact it's not the same, as 'native' database bulk insert, but it allows you to avoid/descrease transport/orms operations/sql query analyse costs
name = request.data.get('name')
period = request.data.get('period')
email = request.data.get('email')
prefix = request.data.get('prefix')
bulk_number = int(request.data.get('bulk_number'))
bulk_list = list()
for _ in range(bulk_number):
code = code_prefix + uuid.uuid4().hex.upper()
bulk_list.append(
DjangoModel(name=name, code=code, period=period, user=email))
bulk_msj = DjangoModel.objects.bulk_create(bulk_list)
Here is how to bulk-create entities from column-separated file, leaving aside all unquoting and un-escaping routines:
SomeModel(Model):
#classmethod
def from_file(model, file_obj, headers, delimiter):
model.objects.bulk_create([
model(**dict(zip(headers, line.split(delimiter))))
for line in file_obj],
batch_size=None)
Using create will cause one query per new item. If you want to reduce the number of INSERT queries, you'll need to use something else.
I've had some success using the Bulk Insert snippet, even though the snippet is quite old.
Perhaps there are some changes required to get it working again.
http://djangosnippets.org/snippets/446/
Check out this blog post on the bulkops module.
On my django 1.3 app, I have experienced significant speedup.
bulk_create() method is one of the ways to insert multiple records in the database table. How the bulk_create()
**
Event.objects.bulk_create([
Event(event_name="Event WF -001",event_type = "sensor_value"),
Entry(event_name="Event WT -002", event_type = "geozone"),
Entry(event_name="Event WD -001", event_type = "outage") ])
**
for a single line implementation, you can use a lambda expression in a map
map(lambda x:MyModel.objects.get_or_create(name=x), items)
Here, lambda matches each item in items list to x and create a Database record if necessary.
Lambda Documentation
The easiest way is to use the create Manager method, which creates and saves the object in a single step.
for item in items:
MyModel.objects.create(name=item.name)