Concurency celery django bulk create or update - django

I have a competition problem with django, I have a celery task that updates the number of users in a table. Since there are a lot of lines to update, I use a bulk update, and to avoid creating unnecessary lines, I use a bulk create.
all_statistic_user = StatisticsCountriesUser.objects.filter(user__in=all_branch_user,countries=user.country)
if all_statistic_user:
all_statistic_user.update(total_users=F('total_users') +1)
all_statistic_user = StatisticsCountriesUser.objects.exclude(user__in=all_branch_user,countries=user.country)
if all_statistic_user:
all_statistic_user = [StatisticsCountriesUser(user_id=user.id,countries=user.country) for user in all_statistic_user ]
StatisticsCountriesUser.objects.bulk_create(all_statistic_user)
else:
all_statistic_user = [StatisticsCountriesUser(user_id=user.id, countries=user.country) for user in all_branch_user]
StatisticsCountriesUser.objects.bulk_create(all_statistic_user)
The problem is that if the task is executed asynchronously, if the tasks are executed at the same time the first task creates the user list, then the second task also retrieves the user list, the first creates or updates the users, but the second task the list no longer has the right information to update or create
. the solution I thought was to put the tasks in a synchronous list and not asynchronously is this possible? Thank you in advance

Related

How to handle network errors when saving to Django Models

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()
Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').
It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

Tasks are not performed using the admin interface

I use django-celery-beat to create the task. Everything is registered, connected, the usual Django tascas using beat_schedule work. I add a new task, I don't register it in beat_schedule:
#app.task
def say_hi():
print("hello test")
I go into admin, add Periodic tasks, this task in Task (registered) is visible, I select it, I also select the interval every minute and save, Task (registered) is zeroed, and its value appears in Task (custom).
The task itself does not start its execution, in the console print is not displayed, but Last Run Datetime is updated in the admin. What could go wrong?

How do I get user task list with its process variables in Camunda

I have a requirement where a user could be assigned thousands (1000 - 5000) tasks, belonging to different process instances (same user task from 1000 - 5000) instances at a given time. I have a custom task list screen where I need to load all the tasks with their basic info (id, name, process instance id etc) and some process variables for each.
First I used the filter/list REST service i.e. engine-rest/filter/{filter-id}/list to get the tasks with the process variables. (I created a filter in Camunda tasklist). But this REST service takes forever to return when there are more than 1000 process instances in question. It took 7-8 mins for about 2000 process instances. Maybe because this service returns a lot of information which I don't need.
So I decided to write my own REST service using Camunda Java api. This is what I did -
List<Task> tasks = taskService.createTaskQuery().processDefinitionKey(processDefinitionKey).taskAssignee(assignee).list();
if(tasks != null && !tasks.isEmpty()){
for(Task task : tasks){
.....
.....
Map<String, Object> variables = taskService.getVariables(task.getId(), variableNames);
.....
}}
This works and is much faster than the filter service. But for about 1000 instances it is taking around 25 secs. (My server is not production grade right now, Tomcat Xms -1gb Xmx - 2gb).
But my concern is that internally is this code hitting the DB 1000 times (for each tasks returned by taskquery) to get the variables? Worse still depending on the number of variables is it querying the DB that many times for each variable? I mean for 5 variables are we hitting the DB 5000 times?
1) If so, is there any way I can improve this service? Like can I write a NativeTaskQuery where I join the act_ru_task, act_ru_process & act_ru_variable tables to get the data I need? Is that the right way?
2) Isn't there any inbuilt caching in Camunda that can help here?
Thanks in advance for your help.
You can use a custom query for this. Write your native sql query and add a mybatis mapping. This example explains the concept : https://github.com/camunda-consulting/code/tree/master/snippets/custom-queries

Django cron-job for user-specified times

I am wanting to send notifications based on a user-specified time. IE, in google calendar, I can receive a text message when my task time is hit.
Is the solution to this to run a cron job, have it execute every minute and scan which users have a time equaling the current time?
Since you tagged your question with celery, I assume you have celery running. You could use the eta kwarg to apply_async() to schedule a task to run at a specific time, see here:
http://docs.celeryproject.org/en/latest/userguide/calling.html#eta-and-countdown
If you need to use a cron job, I would not check if notification_time == current_time, but rather track unsent notifications with a boolean is_sent field on the model and check for notification_time <= current_time and not is_sent. This seems to be slightly less error prone. You could also add some form of check to prevent mass-sending notifications in case your system goes down for a few hours.

how do i update my db data(mongodb) after fixed interval of time using django?

My User collection contains data such as
{"user1":"zera",
"my_status":"active",
"date_creation" : ISODate("2013-10-01T10:15:52.055Z")
}
{"user2":"dfgf",
"my_status":"noactive",
"date_creation": ISODate("2013-10-01T08:55:41.212Z")
}
I need to find each user with my_status :"active" and update their my_status after 24 hours from each user's date_creation.
Can anyone suggest a method to do it using django?
Well, I'd write an async task to keep polling the database to check for users with active status. If the user is active, update their status.
For the asynchronous tasks, you can use python-rq but to make things easier there's a django module for python-rq, it's django-rq. Also, Celery is another popular and good option. There's also a module for Django, you can find it here.