Thousands of extraneous gunicorn workers - django

I'm using gunicorn 19.7.1 appserver with nginx reverse proxy for a Django project (Ubuntu 14.04 machine).
ps aux | grep gunicorn | grep -v grep | wc -l yields 3043 at the moment.
Whereas in /etc/init/gunicorn.conf, I've always had -w 33. Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
How do I kill the extraneous workers?
How did this happen?
The worker count of 33 has always been properly configured on my busy production system.
However a few hours ago, I was trying python's multiprocessing on the server and things went south. Gunicorn workers ate up all the memory and took out the resident redis instances as well.
I reverted the change and have managed to get everything back online, except the memory hasn't been released and I've had to cope with these legacy gunicorn workers. What's going on?

Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
service only manages service-initiated processes, so if you started Gunicorn workers outside of the service framework, these workers will continue to live even if you stop.
How do I kill the extraneous workers?
The fast way:
Run this command to list all gunicorn process IDs and terminate them, and then restart Gunicorn:
$ pkill gunicorn
$ sudo service gunicorn start
The better way:
Identify your "desired" Gunicorn workers by finding the parent:
$ sudo service gunicorn status
Note the parent process ID. Let's say it's 123.
Save a list of all the "desired" workers' PIDs:
$ echo 123 > desired_workers
$ pgrep -P 123 >> desired_workers
Save a list of all workers' PIDs:
$ pgrep gunicorn > all_workers
Terminate the "undesired" workers:
$ cat desired_workers all_workers | sort | uniq -u | xargs kill

Related

Maintaining Gunicorn and flask on Unix

I have flask app and starting that with gunicorn with 5 threads. I have two options so far to stop that running. Either grep for gunicorn and kill all 5 pids at once will kill command or pkill command.
But both are not what i am looking, especially with pkill, there are other applications running with same user id.
Anyone has a script I can use? Or an idea how I can implement?
Gunicorn can write his pid to a file.
gunicorn ... -p /path/to/your/file/gunicorn.pid ...
Then you can run something like that to kill the application:
kill `cat /path/to/your/file/gunicorn.pid`

How do I restart airflow webserver?

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command
airflow webserver -p 8080 -D True
Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works.
So How can I restart my daemon airflow webserver process in my server??
I advice running airflow in a robust way, with auto-recovery with systemd
so you can do:
- to start systemctl start airflow
- to stop systemctl stop airflow
- to restart systemctl restart airflow
For this you'll need a systemd 'unit' file.
As a (working) example you can use the following:
put it in /lib/systemd/system/airflow.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
PIDFile=/run/airflow/webserver.pid
EnvironmentFile=/home/airflow/airflow.env
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'export AIRFLOW_HOME=/home/airflow ; airflow webserver --pid /run/airflow/webserver.pid'
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=42s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
P.S: change AIRFLOW_HOME to where your airflow folder with the config
Can you check $AIRFLOW_HOME/airflow-webserver.pid for the process id of your webserver daemon?
Then pass it a kill signal to kill it
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
Then clear the pid file
cat /dev/null > $AIRFLOW_HOME/airflow-webserver.pid
Then just run
airflow webserver -p 8080 -D True
to restart the daemon.
This worked for me (multiple times! :D )
find the process id: (assuming 8080 is the port)
lsof -i tcp:8080
kill it
kill <pid>
Use Airflow webserver's (gunicorn) signal handling
Airflow uses gunicorn as it's HTTP server, so you can send it standard POSIX-style signals. A signal commonly used by daemons to restart is HUP.
You'll need to locate the pid file for the airflow webserver daemon in order to get the right process id to send the signal to. This file could be in $AIRFLOW_HOME or also /var/run, which is where you'll find a lot of pids.
Assuming the pid file is in /var/run, you could run the command:
cat /var/run/airflow-webserver.pid | xargs kill -HUP
gunicorn uses a preforking model, so it has master and worker processes. The HUP signal is sent to the master process, which performs these actions:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the preload_app option), Gunicorn will also load the new version of it.
More information in the gunicorn signal handling docs.
This is mostly an expanded version of captaincapsaicin's answer, but using HUP (SIGHUP) instead of KILL (SIGKILL) to reload the process instead of actually killing it and restarting it.
In my case i want to kill previous airflow process and start.
for that following command did the magic
killall -9 airflow
As the question was related to webserver, this is something that worked in my case:
systemctl restart airflow-webserver
Just run:
airflow webserver -p 8080 -D
Find pid with:
airflow webserver
will give: "The webserver is already running under PID 21250."
Than kill web server process with:
kill 21250
None of these worked for me. I had to delete the $AIRFLOW_HOME/airflow-webserver.pid file and then running airflow webserver worked.
Create a init script and use the command "daemon" to run this as service.
daemon --user="${USER}" --pidfile="${PID_FILE}" airflow webserver -p 8090 >> "${LOG_FILE}" 2>&1 &
The recommended approach is to create and enable the airflow webserver as a service. If you named the webserver as 'airflow-webserver', run the following command to restart the service:
systemctl restart airflow-webserver
You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Note: LightningFlow comes pre-integrated with all required libraries, Livy, custom operators, and local Spark cluster.
Link for AWS Marketplace: https://aws.amazon.com/marketplace/pp/Lightning-Analytics-Inc-LightningFlow-Integrated-o/B084BSD66V
Just by killing processes!!
Assuming the default airflow home directory is ~/airflow/
List the 3 parent processes running the airflow (PID):
cat ~/airflow/airflow-scheduler.pid
cat ~/airflow/airflow-webserver.pid
cat ~/airflow/airflow-webserver-monitor.pid
Get their PGID using:
ps -xjf
And finally run loop to kill all tree of each parent (PID):
for child in $(ps x -o "%P %p %r"| awk '{ if ( $1 == $your_first_PID || $3 == $your_first_PGID) { print $2 }}'); do kill $child; done
To restart Airflow you need to restart Airflow webserver and Airflow scheduler.
Check if Airflow servers are running:
ps -aux | grep airflow
if you see in list of running processes entries like:
ubuntu 49601 0.1 1.6 266668 135520 ? S 12:19 0:00 [ready] gunicorn: worker [airflow-webserver]
This means that Airflow webserver is running.
If you see entries like this:
ubuntu 49653 0.6 2.3 308912 187596 ? S 12:19 0:00 airflow scheduler -- DagFileProcessorManager
That means that Airflow scheduler is running.
Stop Airflow servers (webserver and scheduler):
pkill -f "airflow scheduler"
pkill -f "airflow webserver"
Now use again ps -aux | grep airflow to check if they are really shut down.
Start Airflow servers in background (daemon):
airflow webserver -D
airflow scheduler -D

django/gunicorn app restart

I have 2 different projects running on the same server. They are both Django projects with Gunicorn as wsgi server. The server on top is Apache. Currently there is a Jenkins job that updates the source code from the repo and restart(Kill and start) gunicorn. This worked fine till the server was only serving 1 site.
I killed the gunicorn as follows
#!/bin/bash
ps -ef | grep gunicorn | grep -v grep | awk '{print $2}' | xargs kill -9
and then restarted it. However this approach will will not work with 2 sites, since killing Gunicorn completely kills all Gunicorn processes. At any time I run the build, only the gunicorn for that that site will get re spawned.
I looked around and i found that Supervisor was one utility that I should use to prevent this and seamlessly restart Gunicorn.
Do you guys have have other suggestions or best practices that I should follow?
Thanks
To only grab your project's gunicorn and restart it, you can use the following:
ps aux |grep gunicorn |grep yourappname | awk '{ print $2 }' |xargs kill -HUP
Other gunicorn processes will not be affected.
Gunicorn + Supervisor is pretty standard stack, you could have your sites separated as different Supervisor tasks and instead of telling Jenkins to restart Supervisor, use the Supervisor method for restarting just one of your tasks, and you're done.
Supervisor is also great if your site crashes and Gunicorn needs to be executed again.

How do I restart gunicorn hup , i dont know masterpid or location of PID file

I want to restart a Django server which is running using gunicorn.
I know how to use gunicorn in my system. But now I need to restart a remote server which is not set up by me.
I don't know masterpid to restart the server how can I get the masterPID.
Usually I HUP gunicorn with sudo kill -s HUP masterpid.
I tried with ps aux|grep gunicorn
and I did not find the gunicorn.pid file anywhere.
How can I get the masterpid?
the one liner below, gets the job perfectly done:
kill -HUP `ps -C gunicorn fch -o pid | head -n 1`
Explanation
pc -C gunicorn only lists the processes with gunicorn command, i.e., workers and master process. Workers are children of master as can be seen using ps -C gunicorn fc -o ppid,pid,cmd. We only need the pid of the master, therefore h flag is used to remove the first line which is PID text. Note that, f flag assures that master is printed above workers.
The correct procedure is to send HUP signal only to the master. In this way gunicorn is gracefully restarted, only the workers, not master, are recreated.
You can run gunicorn with option '-p', so you can get the pid of the master process from the pid file.
For example:
gunicorn -p app.pid your_app.wsgi.app
You can get the pid of the master by:
cat app.pid
This should also work to restart gunicorn:
ps aux |grep gunicorn |grep yourapp | awk '{ print $2 }' |xargs kill -HUP
Step 1:
Go to /etc/systemd/system/gunicorn.service and open file
add bellow line
PIDFile=/run/gunicorn/gunicorn.pid
--pid /run/gunicorn/gunicorn.pid
Example:
[Service]
PIDFile=/run/gunicorn/gunicorn.pid
WorkingDirectory=/home/django/django_project
ExecStart=/usr/bin/gunicorn --pid /run/gunicorn/gunicorn.pid --name=django_project.....
User=django
Group=django
Step 2:
Go to /etc/tmpfiles.d/ and create new file gunicorn.conf if not exist
add Bellow line
d /run/gunicorn 0755 django django -
where django = user and group name
Step 3:
Reboot your server or /etc/init.d/gunicorn restart to restart gunicorn to take effect
your pid file location is /run/gunicorn/gunicorn.pid check now..
Building on krizex's answer answer, when your master pid is stored in a file, you can gracefully reload your app in one command like this
$ cat app.pid |xargs kill -HUP
I would have liked to comment on the answer itself but I don't have enough reputation to comment yet 😢.

How to stop gunicorn properly

I'm starting gunicorn with the Django command python manage.py run_gunicorn. How can I stop gunicorn properly?
Note: I have a semi-automated server deployment with fabric. Thus using something like ps aux | grep gunicorn to kill the process manually by pid is not an option.
To see the processes is ps ax|grep gunicorn and to stop gunicorn_django is pkill gunicorn.
One option would be to use Supervisor to manage Gunicorn.
Then again i don't see why you can't kill the process via Fabric.
Assuming you let Gunicorn write a pid file you could easily read that file in a Fabric command.
Something like this should work:
run("kill `cat /path/to/your/file/gunicorn.pid`")
pkill gunicorn
or
pkill -P1 gunicorn
should kill all running gunicorn processes
pkill gunicorn stops all gunicorn daemons. So if you are running multiple instances of gunicorn with different ports, try this shell script.
#!/bin/bash
Port=5000
pid=`ps ax | grep gunicorn | grep $Port | awk '{split($0,a," "); print a[1]}' | head -n 1`
if [ -z "$pid" ]; then
echo "no gunicorn deamon on port $Port"
else
kill $pid
echo "killed gunicorn deamon on port $Port"
fi
ps ax | grep gunicorn | grep $Port shows the daemons with specific port.
Here is the command which worked for me :
pkill -f gunicorn
It will kill any process with the name gunicorn
Start:
gunicorn --pid PID_FILE APP:app
Stop:
kill $(cat PID_FILE)
The --pid flag of gunicorn requires a single parameter: a file where the process id will be stored. This file is also automatically deleted when the service is stopped.
I have used PID_FILE for simplicity but you should use something like /tmp/MY_APP_PID as file name.
If the PID file exists it means the service is running. If it is not there, the service is not running. To stop the service just kill it as mentioned.
You could also want to include the --daemon flag in order to detach the process from the current shell.
To start the service which is running on gunicorn
sudo systemctl enable myproject
sudo systemctl start myproject
or
sudo systemctl restart myproject
But to stop the service running on gunicorn
sudo systemctl stop myproject
to know more about python application hosting using gunicorn please refer here
kill -9 `ps -eo pid,command | grep 'gunicorn.*${moduleName:appName}' | grep -v grep | sort | head -1 | awk '{print $1}'`
ps -eo pid,command will only fetch process id, command and args out
grep -v grep to get rid of output like 'grep --color=auto xxx'
sort | head -1 to do ascending sort and get first line
awk '{print $1}' to get pid back
One more thing you may need to pay attention to: Where gunicorn is installed and which one you're using?
Ubuntu 16 has gunicorn installed by default, the executable is gunicorn3 and located on /usr/bin/gunicorn3, and if you installed it by pip, it's located on /usr/local/bin/gunicorn. You would need to use which gunicorn and gunicorn -v to find out.
In your terminal, do:
ps ax|grep gunicorn
Then to kill the Gunicorn process, just do that:
kill -9 <gunicorn pid number>
In my case I dealt with many processes
For example: kill -9 398 399 4225 4772
The above solutions does not remove pid file when the process is killed.
cat <pid-file> | xargs kill -2
This solution reads pid file and send interrupt signal. This closes gunicorn properly and pid file is also removed.
PID file can be generated by
gunicorn --pid PID-FILE
or by adding the following in config file
pidfile = "pid_file"
If we run:
pkill gunicorn
We stop all gunicorn services, in this case to start gunicorn we only need to stop the parent process associated with the service that attends the port where gunicorn will be executed.
The following script searches for said process (pid), if it exists it kills this process:
#!/bin/bash
# ---------------------
stop_unicorn_on_port() {
pid=$(lsof -w -t -i "TCP:${1}" | head -1)
if [ -z "${pid}" ]; then
echo "🦄 no service deamon on port ${1}"
else
kill -9 "${pid}"
echo "🦄 killed service deamon(${pid}) on port ${1}"
fi
}
# Example/Testing
stop_unicorn_on_port 5000
stop_unicorn_on_port 5001
stop_unicorn_on_port 5002
more info check: man lsoft
-t specifies that lsof should produce terse output with process identifiers only and no header - e.g., so
that the output may be piped to kill(1). -t selects the -w option.
-iselects the listing of files any of whose Internet address matches the address specified in i. If no
address is specified, this option selects the listing of all Internet and x.25 (HP-UX) network files...
Here are some sample addresses:
-i6 - IPv6 only
TCP:25 - TCP and port 25
#1.2.3.4 - Internet IPv4 host address 1.2.3.4
I built upon #David's recommendation to use --pid (PID_FILE) to fix the problem I faced because killing the parent pid didn't kill worker processes.
import os
import sys
import psutil
def stop_pid(pid):
if sys.platform == 'win32':
p = psutil.Process(pid)
p.terminate() # or p.kill()
else:
os.system('kill -9 {0}'.format(pid))
def get_child_pids(ppid):
pid_list = []
for process in psutil.process_iter():
_ppid = process.ppid()
if _ppid == ppid:
_pid = process.pid
pid_list.append(_pid)
return pid_list
def send_kill_cmd(ppid, cpids):
stop_pid(ppid) # Killing the parent proc first
for pid in cpids:
stop_pid(pid)
if __name__ == '__main__':
parent_pid = int(sys.argv[1])
child_pids = get_child_pids(parent_pid)
send_kill_cmd(parent_pid, child_pids)
Then finally excecuted above python script with below commands
#!/bin/bash
FILE_NAME=PID_FILE
if [ -f "$FILE_NAME" ]; then
pypy stop_gunicorn.py "$(cat PID_FILE)"
echo "killed - $(cat PID_FILE) and it's child processes."
sleep 2
fi
echo 'Starting gunicorn'
nohup gunicorn --workers 1 --bind 0.0.0.0:5050 app:app --thread 50 --worker-class eventlet --reload --pid PID_FILE > nohup_outs/nohup_process.out &