Catch externally added data to Postgres database in Django - django

I have a Django backend for my project using Postgres as a DB. Apart from it, I have a systemd service running and inserting a new row in a specific table in the DB. I would need to detect the moment in which a new row is inserted in the table to run a callback function, which is defined in Django. The service takes different times to complete. I would also like to make it efficient. I thought of these options:
I used to use celery in the project but don't anymore. It is kind of set up already, so I thought one option would be a PeriodicTask that checks if something has been addedd. I dislike periodic tasks, though, and it is not quite precise (there could be a gap between the time the service finishes and the time the task runs). So EASY but UGLY
If it were possible, I would like to use a Postgres TRIGGER to insert my callback task in the Celery queue. On paper, that sounds fast and clean. I have NO CLUE how to add something to the Celery task queue though. So CLEAN and EFFICIENT but DIFFICULT
I thought of implementing a separate service which listens to NOTIFY from Postgres, also using TRIGGER. It would then cURL the backend to start the callback function. This seems clunky. So MEH
If it seems like I do not know much of this, you are absolutely correct. I am learning as I go, but this is a must have feature for me. Any help would be much appreciated.

You could add a django management command and call it after your current execution on your service.
Just to add an example:
[Unit]
...
[Service]
...
ExecStart=java -jar file.jar && /.../python /.../manage.py <management_command>
[Install]
...

Related

What is the best way to transfer large files through Django app hosted on HEROKU?

HEROKU gives me H12 error on transferring the file to an API from my Django application (Understood it's a long running process and there is some memory/worker tradeoff I guess so). I am on one single hobby Dyno right now.
The function just runs smoothly for around 50MB file. The file itself is coming from a different source ( requests python package )
The idea is to build a file transfer utility using Django app on HEROKU. The file will not gets stored in my app side. Its just getting from point A and sending to point B.
Went through multiple discussions along with the standard HEROKU documentations, however I am struggling in between in some concepts:
Will this problem be solved by background tasks really? (If YES, I am finding explanation of the process than the direct way to do it such that I can optimize my flow)
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
Have you looked at using celery to run this as a background task?
This is a very standard way of dealing with requests which take a large time to complete.
Will this problem be solved by background tasks really? ( If YES, I am finding explanation of the process than the direct way to do it such that I can optimise my flow )
Yes it can be solved by background tasks. If you are using something like Celery which has direct support for django, you will be running another instance of your Django application but with a different startup command for Celery. It then keeps reading for new tasks to execute and reads the task name from the redis queue (or rabbitmq - whichever you use as the broker) and then executes that task and updates the status back to redis (or the broker you use).
You can also use flower along with celery so that you have a dashboard to see how many tasks are being executed and what are their statuses etc.
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
To use background task with Celery you will need to set up some sort of message broker like Redis or RabbitMQ
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
I dont think that would help for your use case
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
When you use celery, you will have to start few workers for that Celery instance, these workers are the ones who execute your background tasks. Celery documentation will help you with exact count calculation of these workers based on your instance CPU and memory etc.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
I have worked on few projects where we used Celery with background tasks to upload large files. It has worked well for our use cases.
Here is my final take on this after full evaluation, trials and earlier recommendations made here, thanks #arun.
HEROKU needs a web worker to deliver the website runtime which hold 512MB of memory, operations your perform if are below this limits should be fine.
Beyond that let's say you have scenarios like mentioned above where a large file is coming from one source api and going into another target api with Django app, you will have to:
First, you will have to run the file upload function as a background process since it will take time more than 30 seconds to respond which HEROKU expects to return. If not H12 Error is waiting for you. Solution to this is implementing Django Background tasks, Celery worked in my case. So here Celery is your same Django app functionality running as a background handler which needs its own app Dyno ( The Worker ) This can be scaled as needed in future.
To make your Django WSGI ( Frontend App ) talk to the Celery ( Background App), you need a message broker in between which can be HEROKU Redis, RabbitMQ, etc.
Second, the problems doesn't gets solved here even though you have a new Worker dedicated for the Celery app, the memory limits will still apply as its also a Dyno with its own memory.
To overcome this, your Python requests module should download the file in stream instead of direct downloading complete file in a single memory buffer. Iterate and load the stream data in chunks and send the file chunks to target endpoint.
Even chunk size plays here an important role. I will not put exact number here since it depends on various factors:
Should not be too small, else it will take more time to transfer.
Should not be too big to be handled by either of the source/target endpoint servers.

Django: can functions within views run continuously even as other requests are made?

I'm trying to create a function that, when called, will extract information from an external source at irregular (and undefined) intervals. This data will then be placed in a database for later retrieval. I want this to be then running in the background even as other page requests are made. Is this possible?
The best way to run a Django function outside the request/response cycle is to implement it as a custom management command, which you can then set to run periodically using cron.
If you're already using it, celery supports periodic tasks using celerybeat, but this requires configuring and running the celerybeat daemon, which can be a headache. Celery also supports long-running tasks (things started in a view, but completing in their own time), as described in your question title.
Since you seem to need the function to be called when a page is loaded, you can put it inside your view as
def my_view(request):
#Call the long running function
long_running_function()
#Do view logic and return
return HttpResponse(...)
To handle the long_running_function you could use celery and create a tasks.py which implements your external data source logic. Creating tasks, adding to the queue and configuring celery is summarized here
If you just need a simpler solution for trying it out, take a look at the subprocess module.
A very similar answer here Django: start a process in a background thread?

Django + execute asynchronous process?

I am implementing a feature in a new project and I was wondering what was the optimal solution to it. The feature itself consists of sub functionality as follows: starting a process, stop a process, and checking if the process is running...all these done in a non-blocking way with django. I am trying to avoid stuff like RabbitMQ, etc. I was thinking maybe of using threading or cron.
EDIT: these functionality need to be triggered from a view.
Any comments or suggestions are the most welcomed. Thanks.
You can surely use celery with the database backend instead of RabbitMQ. Personally, for simple tasks I tend to just write a custom mangement command launched from cron, that gets its input from some database table (i.e. Django model) which is populated by Django view(s).

Django and services

I'm building a simple website with django that requires constant monitoring of text-based data from another website, that's the way it have to be.
How could I run this service on my web-host using django? would I have to start a separate app and run it via SSH, so it updates the database used by django, or are there any easier/better way?
You could use celery to schedule a job that would read data from that other website and do whatever you want with it.
As an alternative to celery, you could also create a cron job that executes a custom django-admin command. That would give you full access to your django install and ORM. The downside is that cron's smallest time resolution is 1 minute, so if you need it to be real-time, you're not going to be able to do that.
If you do need realtime, then creating a python daemon might be a better option.

Rather than using crontab, can Django execute something automatically at a predefined time

How to make Django execute something automatically at a particular time.?
For example, my django application has to ftp upload to remote servers at pre defined times. The ftp server addresses, usernames, passwords, time, day and frequency has been defined in a django model.
I want to run a file upload automatically based on the values stored in the model.
One way to do is to write a python script and add it to the crontab. This script runs every minute and keeps an eye on the time values defined in the model.
Other thing that I can roughly think of is maybe django signals. I'm not sure if they can handle this issue. Is there a way to generate signals at predefined times (Haven't read indepth about them yet).
Just for the record - there is also celery which allows to schedule messages for the future dispatch. It's, however, a different beast than cron, as it requires/uses RabbitMQ and is meant for message queues.
I have been thinking about this recently and have found django-cron which seems as though it would do what you want.
Edit: Also if you are not specifically looking for Django based solution, I have recently used scheduler.py, which is a small single file script which works well and is simple to use.
I've had really good experiences with django-chronograph.
You need to set one crontab task: to call the chronograph python management command, which then runs other custom management commands, based on an admin-tweakable schedule
The problem you're describing is best solved using cron, not Django directly. Since it seems that you need to store data about your ftp uploads in your database (using Django to access it for logs or graphs or whatever), you can make a python script that uses Django which runs via cron.
James Bennett wrote a great article on how to do this which you can read in full here: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
The main gist of it is that, you can write standalone django scripts that cron can launch and run periodically, and these scripts can fully utilize your Django database, models, and anything else they want to. This gives you the flexibility to run whatever code you need and populate your database, while not trying to make Django do something it wasn't meant to do (Django is a web framework, and is event-driven, not time-driven).
Best of luck!