Maintaining Gunicorn and flask on Unix - flask

I have flask app and starting that with gunicorn with 5 threads. I have two options so far to stop that running. Either grep for gunicorn and kill all 5 pids at once will kill command or pkill command.
But both are not what i am looking, especially with pkill, there are other applications running with same user id.
Anyone has a script I can use? Or an idea how I can implement?

Gunicorn can write his pid to a file.
gunicorn ... -p /path/to/your/file/gunicorn.pid ...
Then you can run something like that to kill the application:
kill `cat /path/to/your/file/gunicorn.pid`

Related

Thousands of extraneous gunicorn workers

I'm using gunicorn 19.7.1 appserver with nginx reverse proxy for a Django project (Ubuntu 14.04 machine).
ps aux | grep gunicorn | grep -v grep | wc -l yields 3043 at the moment.
Whereas in /etc/init/gunicorn.conf, I've always had -w 33. Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
How do I kill the extraneous workers?
How did this happen?
The worker count of 33 has always been properly configured on my busy production system.
However a few hours ago, I was trying python's multiprocessing on the server and things went south. Gunicorn workers ate up all the memory and took out the resident redis instances as well.
I reverted the change and have managed to get everything back online, except the memory hasn't been released and I've had to cope with these legacy gunicorn workers. What's going on?
Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
service only manages service-initiated processes, so if you started Gunicorn workers outside of the service framework, these workers will continue to live even if you stop.
How do I kill the extraneous workers?
The fast way:
Run this command to list all gunicorn process IDs and terminate them, and then restart Gunicorn:
$ pkill gunicorn
$ sudo service gunicorn start
The better way:
Identify your "desired" Gunicorn workers by finding the parent:
$ sudo service gunicorn status
Note the parent process ID. Let's say it's 123.
Save a list of all the "desired" workers' PIDs:
$ echo 123 > desired_workers
$ pgrep -P 123 >> desired_workers
Save a list of all workers' PIDs:
$ pgrep gunicorn > all_workers
Terminate the "undesired" workers:
$ cat desired_workers all_workers | sort | uniq -u | xargs kill

How do I restart airflow webserver?

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command
airflow webserver -p 8080 -D True
Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works.
So How can I restart my daemon airflow webserver process in my server??
I advice running airflow in a robust way, with auto-recovery with systemd
so you can do:
- to start systemctl start airflow
- to stop systemctl stop airflow
- to restart systemctl restart airflow
For this you'll need a systemd 'unit' file.
As a (working) example you can use the following:
put it in /lib/systemd/system/airflow.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
PIDFile=/run/airflow/webserver.pid
EnvironmentFile=/home/airflow/airflow.env
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'export AIRFLOW_HOME=/home/airflow ; airflow webserver --pid /run/airflow/webserver.pid'
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=42s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
P.S: change AIRFLOW_HOME to where your airflow folder with the config
Can you check $AIRFLOW_HOME/airflow-webserver.pid for the process id of your webserver daemon?
Then pass it a kill signal to kill it
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
Then clear the pid file
cat /dev/null > $AIRFLOW_HOME/airflow-webserver.pid
Then just run
airflow webserver -p 8080 -D True
to restart the daemon.
This worked for me (multiple times! :D )
find the process id: (assuming 8080 is the port)
lsof -i tcp:8080
kill it
kill <pid>
Use Airflow webserver's (gunicorn) signal handling
Airflow uses gunicorn as it's HTTP server, so you can send it standard POSIX-style signals. A signal commonly used by daemons to restart is HUP.
You'll need to locate the pid file for the airflow webserver daemon in order to get the right process id to send the signal to. This file could be in $AIRFLOW_HOME or also /var/run, which is where you'll find a lot of pids.
Assuming the pid file is in /var/run, you could run the command:
cat /var/run/airflow-webserver.pid | xargs kill -HUP
gunicorn uses a preforking model, so it has master and worker processes. The HUP signal is sent to the master process, which performs these actions:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the preload_app option), Gunicorn will also load the new version of it.
More information in the gunicorn signal handling docs.
This is mostly an expanded version of captaincapsaicin's answer, but using HUP (SIGHUP) instead of KILL (SIGKILL) to reload the process instead of actually killing it and restarting it.
In my case i want to kill previous airflow process and start.
for that following command did the magic
killall -9 airflow
As the question was related to webserver, this is something that worked in my case:
systemctl restart airflow-webserver
Just run:
airflow webserver -p 8080 -D
Find pid with:
airflow webserver
will give: "The webserver is already running under PID 21250."
Than kill web server process with:
kill 21250
None of these worked for me. I had to delete the $AIRFLOW_HOME/airflow-webserver.pid file and then running airflow webserver worked.
Create a init script and use the command "daemon" to run this as service.
daemon --user="${USER}" --pidfile="${PID_FILE}" airflow webserver -p 8090 >> "${LOG_FILE}" 2>&1 &
The recommended approach is to create and enable the airflow webserver as a service. If you named the webserver as 'airflow-webserver', run the following command to restart the service:
systemctl restart airflow-webserver
You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Note: LightningFlow comes pre-integrated with all required libraries, Livy, custom operators, and local Spark cluster.
Link for AWS Marketplace: https://aws.amazon.com/marketplace/pp/Lightning-Analytics-Inc-LightningFlow-Integrated-o/B084BSD66V
Just by killing processes!!
Assuming the default airflow home directory is ~/airflow/
List the 3 parent processes running the airflow (PID):
cat ~/airflow/airflow-scheduler.pid
cat ~/airflow/airflow-webserver.pid
cat ~/airflow/airflow-webserver-monitor.pid
Get their PGID using:
ps -xjf
And finally run loop to kill all tree of each parent (PID):
for child in $(ps x -o "%P %p %r"| awk '{ if ( $1 == $your_first_PID || $3 == $your_first_PGID) { print $2 }}'); do kill $child; done
To restart Airflow you need to restart Airflow webserver and Airflow scheduler.
Check if Airflow servers are running:
ps -aux | grep airflow
if you see in list of running processes entries like:
ubuntu 49601 0.1 1.6 266668 135520 ? S 12:19 0:00 [ready] gunicorn: worker [airflow-webserver]
This means that Airflow webserver is running.
If you see entries like this:
ubuntu 49653 0.6 2.3 308912 187596 ? S 12:19 0:00 airflow scheduler -- DagFileProcessorManager
That means that Airflow scheduler is running.
Stop Airflow servers (webserver and scheduler):
pkill -f "airflow scheduler"
pkill -f "airflow webserver"
Now use again ps -aux | grep airflow to check if they are really shut down.
Start Airflow servers in background (daemon):
airflow webserver -D
airflow scheduler -D

django/gunicorn app restart

I have 2 different projects running on the same server. They are both Django projects with Gunicorn as wsgi server. The server on top is Apache. Currently there is a Jenkins job that updates the source code from the repo and restart(Kill and start) gunicorn. This worked fine till the server was only serving 1 site.
I killed the gunicorn as follows
#!/bin/bash
ps -ef | grep gunicorn | grep -v grep | awk '{print $2}' | xargs kill -9
and then restarted it. However this approach will will not work with 2 sites, since killing Gunicorn completely kills all Gunicorn processes. At any time I run the build, only the gunicorn for that that site will get re spawned.
I looked around and i found that Supervisor was one utility that I should use to prevent this and seamlessly restart Gunicorn.
Do you guys have have other suggestions or best practices that I should follow?
Thanks
To only grab your project's gunicorn and restart it, you can use the following:
ps aux |grep gunicorn |grep yourappname | awk '{ print $2 }' |xargs kill -HUP
Other gunicorn processes will not be affected.
Gunicorn + Supervisor is pretty standard stack, you could have your sites separated as different Supervisor tasks and instead of telling Jenkins to restart Supervisor, use the Supervisor method for restarting just one of your tasks, and you're done.
Supervisor is also great if your site crashes and Gunicorn needs to be executed again.

Gunicorn sync workers spawning processes

We're using Django + Gunicorn + Nginx in our server. The problem is that after a while we see lot's of gunicorn worker processes that have became orphan, and a lot other ones that have became zombie. Also we can see that some of Gunicorn worker processes spawn some other Gunicorn workers. Our best guess is that these workers become orphans after their parent workers have died.
Why Gunicorn workers spawn child workers? Why do they die?! And how can we prevent this?
I should also mention that we've set Gunicorn log level to debug and still we don't see any thing significant, other than periodical log of workers number, which reports count of workers we wanted from it.
UPDATE
This is the line we used to run gunicorn:
gunicorn --env DJANGO_SETTINGS_MODULE=proj.settings proj.wsgi --name proj --workers 10 --user proj --group proj --bind 127.0.0.1:7003 --log-level=debug --pid gunicorn.pid --timeout 600 --access-logfile /home/proj/access.log --error-logfile /home/proj/error.log
In my case I deploy in Ubuntu servers (LTS releases, now almost are 14.04 LTS servers) and I never did have problems with gunicorn daemons, I create a gunicorn.conf.py and launch gunicorn with this config from upstart with an script like this in /etc/init/djangoapp.conf
description "djangoapp website"
start on startup
stop on shutdown
respawn
respawn limit 10 5
script
cd /home/web/djangoapp
exec /home/web/djangoapp/bin/gunicorn -c gunicorn.conf.py -u web -g web djangoapp.wsgi
end script
I configure gunicorn with a .py file config and i setup some options (details below) and deploy my app (with virtualenv) in /home/web/djangoapp and no problems with zombie and orphans gunicorn processes.
i verified your options, timeout can be a problem but another one is that you don't setup max-requests in your config, by default is 0, so, no automatic worker restart in your daemon and can generate memory leaks (http://gunicorn-docs.readthedocs.org/en/latest/settings.html#max-requests)
We will use a .sh file to start the gunicorn process. Later you will use a supervisord configuration file. what is supervisord? some external know how information link about how to install supervisord with Django,Nginx,Gunicorn Here
gunicorn_start.sh remember to give chmod +x to the file.
#!/bin/sh
NAME="myDjango"
DJANGODIR="/var/www/html/myDjango"
NUM_WORKERS=3
echo "Starting myDjango -- Django Application"
cd $DJANGODIR
exec gunicorn -w $NUM_WORKERS $NAME.wsgi:application --bind 127.0.0.1:8001
mydjango_django.conf : Remember to install supervisord on your OS. and
Copy this on the configuration folder.
[program:myDjango]
command=/var/www/html/myDjango/gunicorn_start.sh
user=root
autorestart=true
redirect_sderr=true
Later on use the command:
Reload the daemon’s configuration files, without add/remove (no restarts)
supervisordctl reread
Restart all processes Note: restart does not reread config files. For that, see reread and update.
supervisordctl start all
Get all process status info.
supervisordctl status
This sounds like a timeout issue.
You have multiple timeouts going on and they all need to be in a descending order. It seems they may not be.
For example:
Nginx has a default timeout of 60 seconds
Gunicorn has a default timeout of 30 seconds
Django has a default timeout of 300 seconds
Postgres default timeout is complicated but let's pose 60 seconds for this example.
In this example, when 30 seconds has passed and Django is still waiting for Postgres to respond. Gunicorn tells Django to stop, which in turn should tell Postgres to stop. Gunicorn will wait a certain amount of time for this to happen before it kills django, leaving the postgres process as an orphan query. The user will re-initiate their query and this time the query will take longer because the old one is still running.
I see that you have set your Gunicorn tiemeout to 300 seconds.
This would probably mean that Nginx tells Gunicorn to stop after 60 seconds, Gunicorn may wait for Django who waits for Postgres or any other underlying processes, and when Nginx gets tired of waiting, it kills Gunicorn, leaving Django hanging.
This is still just a theory, but it is a very common problem and hopefully leads you and any others experiencing similar problems, to the right place.

How do I restart gunicorn hup , i dont know masterpid or location of PID file

I want to restart a Django server which is running using gunicorn.
I know how to use gunicorn in my system. But now I need to restart a remote server which is not set up by me.
I don't know masterpid to restart the server how can I get the masterPID.
Usually I HUP gunicorn with sudo kill -s HUP masterpid.
I tried with ps aux|grep gunicorn
and I did not find the gunicorn.pid file anywhere.
How can I get the masterpid?
the one liner below, gets the job perfectly done:
kill -HUP `ps -C gunicorn fch -o pid | head -n 1`
Explanation
pc -C gunicorn only lists the processes with gunicorn command, i.e., workers and master process. Workers are children of master as can be seen using ps -C gunicorn fc -o ppid,pid,cmd. We only need the pid of the master, therefore h flag is used to remove the first line which is PID text. Note that, f flag assures that master is printed above workers.
The correct procedure is to send HUP signal only to the master. In this way gunicorn is gracefully restarted, only the workers, not master, are recreated.
You can run gunicorn with option '-p', so you can get the pid of the master process from the pid file.
For example:
gunicorn -p app.pid your_app.wsgi.app
You can get the pid of the master by:
cat app.pid
This should also work to restart gunicorn:
ps aux |grep gunicorn |grep yourapp | awk '{ print $2 }' |xargs kill -HUP
Step 1:
Go to /etc/systemd/system/gunicorn.service and open file
add bellow line
PIDFile=/run/gunicorn/gunicorn.pid
--pid /run/gunicorn/gunicorn.pid
Example:
[Service]
PIDFile=/run/gunicorn/gunicorn.pid
WorkingDirectory=/home/django/django_project
ExecStart=/usr/bin/gunicorn --pid /run/gunicorn/gunicorn.pid --name=django_project.....
User=django
Group=django
Step 2:
Go to /etc/tmpfiles.d/ and create new file gunicorn.conf if not exist
add Bellow line
d /run/gunicorn 0755 django django -
where django = user and group name
Step 3:
Reboot your server or /etc/init.d/gunicorn restart to restart gunicorn to take effect
your pid file location is /run/gunicorn/gunicorn.pid check now..
Building on krizex's answer answer, when your master pid is stored in a file, you can gracefully reload your app in one command like this
$ cat app.pid |xargs kill -HUP
I would have liked to comment on the answer itself but I don't have enough reputation to comment yet 😢.