I am very new to deploying Python (Django=4.1) applications on Cpanel(Shared). I have successfully hosted one app with celery, redis, etc. The issue is, every time I start using the app, the total number of processes(Cpanel) exceeds(120/120) and the app stops working. I have been trying to learn about the number of processes and how to optimize the application. Yet no success. Can somebody describe the following?
What is the number of processes at Cpanel?
How is my app related to this?
How can I optimize my application?
How to keep the application working?
This Python(Django, DRF, Celery, Redis, Requests) app is supposed to be used by 5,000 users.
I tried to learn about 'number of processes and entry processes' online. No luck. Went through a few articles to learn how to optimize Django app.
Related
So I have a basic django blog application. Which i want to dockerise into django. And one more thing. I am writing my question here because there are live people to answer.
You should use a cookie cutter for dockerizing your Django application. Here you can read the docs https://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html
you need to pip freeze first so that you could know what your current Django application taking up. And use them as requirements.txt
Benefits of using Django on docker:
Your code runs on any operating system that supports Docker.
You save time by not needing to configure system dependencies on your
host.
Your local and production environments can be exactly the same,
eliminating errors that only happen in production.
You should start with reading Dokcer docs in order to understand what and why's: https://docs.docker.com/
Long story short - containers (prular) enable to start every service (Django, database, possible front-end, servers, etc.) in separate container and furthermore, to start them up on any OS of your choice.
Those containers can communicate each other thru host separated docker network.
You will need Dockerfile - to set up service, and possibly Docker-compose (if multiple containers) to manage all the containers running.
Here's example docker setup for Django: https://semaphoreci.com/community/tutorials/dockerizing-a-python-django-web-application
My Django app generates a complex report that can take upto 5 minutes to create. Therefore it runs once a night using a scheduled management command.
That's been ok, except I now want the user to be able to select the date range for the report, which means the report needs to be created while the user waits.
What are my options for running the tast in the background? So far I've found these:
Celery - might work but is complex
django-background-tasks looks like the right tool for the job but hasn't been updated for years, last supported Django is 2.2
The report/background task could be generated by AWS Lambda, basically in a microservice. Django calls the Microservice which can execute the background task then call the Django app back once finished. This is what I did last time but not sure it would work now as I'd need to send the microservice 10mb of data to process.
Use subprocess.popen which someone here said worked for them but other reports say it doesn't work from Django.
EDIT: Looks like Django 3.1 onwards supports ASync views and may be the simple solution for this.
HEROKU gives me H12 error on transferring the file to an API from my Django application (Understood it's a long running process and there is some memory/worker tradeoff I guess so). I am on one single hobby Dyno right now.
The function just runs smoothly for around 50MB file. The file itself is coming from a different source ( requests python package )
The idea is to build a file transfer utility using Django app on HEROKU. The file will not gets stored in my app side. Its just getting from point A and sending to point B.
Went through multiple discussions along with the standard HEROKU documentations, however I am struggling in between in some concepts:
Will this problem be solved by background tasks really? (If YES, I am finding explanation of the process than the direct way to do it such that I can optimize my flow)
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
Have you looked at using celery to run this as a background task?
This is a very standard way of dealing with requests which take a large time to complete.
Will this problem be solved by background tasks really? ( If YES, I am finding explanation of the process than the direct way to do it such that I can optimise my flow )
Yes it can be solved by background tasks. If you are using something like Celery which has direct support for django, you will be running another instance of your Django application but with a different startup command for Celery. It then keeps reading for new tasks to execute and reads the task name from the redis queue (or rabbitmq - whichever you use as the broker) and then executes that task and updates the status back to redis (or the broker you use).
You can also use flower along with celery so that you have a dashboard to see how many tasks are being executed and what are their statuses etc.
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
To use background task with Celery you will need to set up some sort of message broker like Redis or RabbitMQ
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
I dont think that would help for your use case
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
When you use celery, you will have to start few workers for that Celery instance, these workers are the ones who execute your background tasks. Celery documentation will help you with exact count calculation of these workers based on your instance CPU and memory etc.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
I have worked on few projects where we used Celery with background tasks to upload large files. It has worked well for our use cases.
Here is my final take on this after full evaluation, trials and earlier recommendations made here, thanks #arun.
HEROKU needs a web worker to deliver the website runtime which hold 512MB of memory, operations your perform if are below this limits should be fine.
Beyond that let's say you have scenarios like mentioned above where a large file is coming from one source api and going into another target api with Django app, you will have to:
First, you will have to run the file upload function as a background process since it will take time more than 30 seconds to respond which HEROKU expects to return. If not H12 Error is waiting for you. Solution to this is implementing Django Background tasks, Celery worked in my case. So here Celery is your same Django app functionality running as a background handler which needs its own app Dyno ( The Worker ) This can be scaled as needed in future.
To make your Django WSGI ( Frontend App ) talk to the Celery ( Background App), you need a message broker in between which can be HEROKU Redis, RabbitMQ, etc.
Second, the problems doesn't gets solved here even though you have a new Worker dedicated for the Celery app, the memory limits will still apply as its also a Dyno with its own memory.
To overcome this, your Python requests module should download the file in stream instead of direct downloading complete file in a single memory buffer. Iterate and load the stream data in chunks and send the file chunks to target endpoint.
Even chunk size plays here an important role. I will not put exact number here since it depends on various factors:
Should not be too small, else it will take more time to transfer.
Should not be too big to be handled by either of the source/target endpoint servers.
I am creating a Django application for school project. I want to schedule jobs (every (work)day on 9:00 and 17:00).
I am trying to do it with Celery right now, but I stuck very hard on it, and as the deadline is in sight, I want to use alternative options: just a cronjob. I think just the cronjob works fine, but the user should be able to edit the times of the cronjobs using the Django web application (so not logging in to SSH, edit the crontab manually).
Is this possible? Can't find anything about it on the internet.
You need django-celery-beat plugin that adds new models to the django admin named "Periodic tasks" where you can manage cron schedule for your tasks.
As an alternate, if you really do not want to run a background tasks, you can create django management commands and use a library like python-crontab to add/modify/remove cron jobs in the system.
I am building a small financial web app with django. The app requires that the database has a complete history of prices, regardless of whether someone is currently using the app. These prices are freely available online.
The way I am currently handling this is by running simultaneously a separate python script (outside of django) which downloads the price data and records it in the django database using the sqlite3 module.
My plan for deployment is to run the app on an AWS EC2 instance, change the permissions of the folder where the db file resides, and separately run the download script.
Is this a good way to deploy this sort of app? What are the downsides?
Is there a better way to handle the asynchronous downloads and the deployment? (PythonAnywhere?)
You can write the daemon code and follow this approach to push data to DB as soon as you get it from Internet. Since your daemon would be running independently from the Django, you'd need to take care of data synchronisation related issues as well. One possible solution could be to use DateTimeField in your Django model with auto_now_add = True, which will give you idea of time when data was entered in DB. Hope this helps you or someone else looking for similar answer.