Django - gunicorn - App level variable (shared across workers) - django

So I have a toy django + gunicorn project.
I want to have a statistical model which is quite big loaded into memory only once and then get it reused in the workers/threads.
How/where do I define an app level variable?
I tried putting it on settings.py, and also on wsgi.py

I don't think you can (nor should). Workers are separate processes that are forked before they run any of your code.
You could put the "model" (what is it that makes it big?) in a Redis DB and access it in each worker from there. The best option would probably be to create a separate service of which you run a single instance, and communicate with through HTTP or RPC from your worker (have a look at nameko for an easy (micro)services framework.
Another option would be to use a single Celery worker, and do the statistical calculations in a task.

Related

How can i Use Django with Docker?

So I have a basic django blog application. Which i want to dockerise into django. And one more thing. I am writing my question here because there are live people to answer.
You should use a cookie cutter for dockerizing your Django application. Here you can read the docs https://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html
you need to pip freeze first so that you could know what your current Django application taking up. And use them as requirements.txt
Benefits of using Django on docker:
Your code runs on any operating system that supports Docker.
You save time by not needing to configure system dependencies on your
host.
Your local and production environments can be exactly the same,
eliminating errors that only happen in production.
You should start with reading Dokcer docs in order to understand what and why's: https://docs.docker.com/
Long story short - containers (prular) enable to start every service (Django, database, possible front-end, servers, etc.) in separate container and furthermore, to start them up on any OS of your choice.
Those containers can communicate each other thru host separated docker network.
You will need Dockerfile - to set up service, and possibly Docker-compose (if multiple containers) to manage all the containers running.
Here's example docker setup for Django: https://semaphoreci.com/community/tutorials/dockerizing-a-python-django-web-application

What is the best way to transfer large files through Django app hosted on HEROKU?

HEROKU gives me H12 error on transferring the file to an API from my Django application (Understood it's a long running process and there is some memory/worker tradeoff I guess so). I am on one single hobby Dyno right now.
The function just runs smoothly for around 50MB file. The file itself is coming from a different source ( requests python package )
The idea is to build a file transfer utility using Django app on HEROKU. The file will not gets stored in my app side. Its just getting from point A and sending to point B.
Went through multiple discussions along with the standard HEROKU documentations, however I am struggling in between in some concepts:
Will this problem be solved by background tasks really? (If YES, I am finding explanation of the process than the direct way to do it such that I can optimize my flow)
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
Have you looked at using celery to run this as a background task?
This is a very standard way of dealing with requests which take a large time to complete.
Will this problem be solved by background tasks really? ( If YES, I am finding explanation of the process than the direct way to do it such that I can optimise my flow )
Yes it can be solved by background tasks. If you are using something like Celery which has direct support for django, you will be running another instance of your Django application but with a different startup command for Celery. It then keeps reading for new tasks to execute and reads the task name from the redis queue (or rabbitmq - whichever you use as the broker) and then executes that task and updates the status back to redis (or the broker you use).
You can also use flower along with celery so that you have a dashboard to see how many tasks are being executed and what are their statuses etc.
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
To use background task with Celery you will need to set up some sort of message broker like Redis or RabbitMQ
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
I dont think that would help for your use case
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
When you use celery, you will have to start few workers for that Celery instance, these workers are the ones who execute your background tasks. Celery documentation will help you with exact count calculation of these workers based on your instance CPU and memory etc.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
I have worked on few projects where we used Celery with background tasks to upload large files. It has worked well for our use cases.
Here is my final take on this after full evaluation, trials and earlier recommendations made here, thanks #arun.
HEROKU needs a web worker to deliver the website runtime which hold 512MB of memory, operations your perform if are below this limits should be fine.
Beyond that let's say you have scenarios like mentioned above where a large file is coming from one source api and going into another target api with Django app, you will have to:
First, you will have to run the file upload function as a background process since it will take time more than 30 seconds to respond which HEROKU expects to return. If not H12 Error is waiting for you. Solution to this is implementing Django Background tasks, Celery worked in my case. So here Celery is your same Django app functionality running as a background handler which needs its own app Dyno ( The Worker ) This can be scaled as needed in future.
To make your Django WSGI ( Frontend App ) talk to the Celery ( Background App), you need a message broker in between which can be HEROKU Redis, RabbitMQ, etc.
Second, the problems doesn't gets solved here even though you have a new Worker dedicated for the Celery app, the memory limits will still apply as its also a Dyno with its own memory.
To overcome this, your Python requests module should download the file in stream instead of direct downloading complete file in a single memory buffer. Iterate and load the stream data in chunks and send the file chunks to target endpoint.
Even chunk size plays here an important role. I will not put exact number here since it depends on various factors:
Should not be too small, else it will take more time to transfer.
Should not be too big to be handled by either of the source/target endpoint servers.

Catch externally added data to Postgres database in Django

I have a Django backend for my project using Postgres as a DB. Apart from it, I have a systemd service running and inserting a new row in a specific table in the DB. I would need to detect the moment in which a new row is inserted in the table to run a callback function, which is defined in Django. The service takes different times to complete. I would also like to make it efficient. I thought of these options:
I used to use celery in the project but don't anymore. It is kind of set up already, so I thought one option would be a PeriodicTask that checks if something has been addedd. I dislike periodic tasks, though, and it is not quite precise (there could be a gap between the time the service finishes and the time the task runs). So EASY but UGLY
If it were possible, I would like to use a Postgres TRIGGER to insert my callback task in the Celery queue. On paper, that sounds fast and clean. I have NO CLUE how to add something to the Celery task queue though. So CLEAN and EFFICIENT but DIFFICULT
I thought of implementing a separate service which listens to NOTIFY from Postgres, also using TRIGGER. It would then cURL the backend to start the callback function. This seems clunky. So MEH
If it seems like I do not know much of this, you are absolutely correct. I am learning as I go, but this is a must have feature for me. Any help would be much appreciated.
You could add a django management command and call it after your current execution on your service.
Just to add an example:
[Unit]
...
[Service]
...
ExecStart=java -jar file.jar && /.../python /.../manage.py <management_command>
[Install]
...

Access to Django ORM from remote Celery worker

I have a Django application and a Celery worker - each running on it's own server.
Currently, Django app uses SQLite to store the data.
I'd like to access the database using Django's ORM from the worker.
Unfortunately, it is not completely clear to me; thus I have some questions.
Is it possible without hacks/workarounds? I'd like to have a simple solution (I would not like to implement REST interface to object access). I imagine that achieving this could be done if I started using PostgreSQL instance which is accessible from both servers.
Which project files (there's just Django + tasks.py file) are required on the worker's machine?
Could you provide me with an example or tutorial? I tried looking it up but found just tutorials/answers bound to a problem of local Celery workers.
I have been searching for ways to do this simply but... Your best option is to attached a kind of callback to the task function that will call another function on the django server to carry out the database update

What is the best deployment configuration for Django?

I will be deploying my django project on the server. For that purpose I plan on doing the following optimization.
What i would like to know is that am I missing something?
How can I do it in a better manner?
Front-end:
Django-static (For compressing static media)
Running jquery from CDN
Cache control for headers
Indexing the Django db (For certain models)
Server side:
uswgi and nginx .
Memcached (For certain queries)
Putting the media and database on separate servers
These are some optimization I use on a regular basis:
frontend:
Use a js loading library like labjs, requirejs or yepnope. You should still compress/merge your js files, but in most use cases it seems to be better to make several requests to several js files and run them in parallel as to have 1 huge js file to run on each page. I always split them up in groups that make sense to balance requests and parellel loading. Some also allow for conditional loading and failovers (i.e. if for some reason, your cdn'd jquery is not there anymore)
Use sprites where posible.
Backend:
configure django-compressor (django-static is fine)
Enable gzip compression in nginx.
If you are using postgresql (which is the recommended sql database), use something like pgbouncer or pgpool2.
Use and configure cache (I use redis)
(already mentioned - use celery for everything that might take longer)
Small database work: use indexes where it's needed, look out for making too many queries (common when not using select_related where you are supposed to) or slow queries (enable log slow queries in your db). Always use select_related with arguments.
If implementing search, I always use a standalone search engine. (elasticsearch/solr)
Now comes profiling the app and looking for code specific improvements. Some things to keep an eye on.
An option may be installing Celery if you need to support asynchronous & period tasks. If you do so, consider installing Redis instead of Memcached. Using Redis you can manage sessions and carry out Celery operations as well as do caching.
Take a look at here: http://unfoldthat.com/2011/09/14/try-redis-instead.html