How do I restart airflow webserver? - python-2.7

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command
airflow webserver -p 8080 -D True
Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works.
So How can I restart my daemon airflow webserver process in my server??

I advice running airflow in a robust way, with auto-recovery with systemd
so you can do:
- to start systemctl start airflow
- to stop systemctl stop airflow
- to restart systemctl restart airflow
For this you'll need a systemd 'unit' file.
As a (working) example you can use the following:
put it in /lib/systemd/system/airflow.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
PIDFile=/run/airflow/webserver.pid
EnvironmentFile=/home/airflow/airflow.env
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'export AIRFLOW_HOME=/home/airflow ; airflow webserver --pid /run/airflow/webserver.pid'
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=42s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
P.S: change AIRFLOW_HOME to where your airflow folder with the config

Can you check $AIRFLOW_HOME/airflow-webserver.pid for the process id of your webserver daemon?
Then pass it a kill signal to kill it
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
Then clear the pid file
cat /dev/null > $AIRFLOW_HOME/airflow-webserver.pid
Then just run
airflow webserver -p 8080 -D True
to restart the daemon.

This worked for me (multiple times! :D )
find the process id: (assuming 8080 is the port)
lsof -i tcp:8080
kill it
kill <pid>

Use Airflow webserver's (gunicorn) signal handling
Airflow uses gunicorn as it's HTTP server, so you can send it standard POSIX-style signals. A signal commonly used by daemons to restart is HUP.
You'll need to locate the pid file for the airflow webserver daemon in order to get the right process id to send the signal to. This file could be in $AIRFLOW_HOME or also /var/run, which is where you'll find a lot of pids.
Assuming the pid file is in /var/run, you could run the command:
cat /var/run/airflow-webserver.pid | xargs kill -HUP
gunicorn uses a preforking model, so it has master and worker processes. The HUP signal is sent to the master process, which performs these actions:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the preload_app option), Gunicorn will also load the new version of it.
More information in the gunicorn signal handling docs.
This is mostly an expanded version of captaincapsaicin's answer, but using HUP (SIGHUP) instead of KILL (SIGKILL) to reload the process instead of actually killing it and restarting it.

In my case i want to kill previous airflow process and start.
for that following command did the magic
killall -9 airflow

As the question was related to webserver, this is something that worked in my case:
systemctl restart airflow-webserver

Just run:
airflow webserver -p 8080 -D

Find pid with:
airflow webserver
will give: "The webserver is already running under PID 21250."
Than kill web server process with:
kill 21250

None of these worked for me. I had to delete the $AIRFLOW_HOME/airflow-webserver.pid file and then running airflow webserver worked.

Create a init script and use the command "daemon" to run this as service.
daemon --user="${USER}" --pidfile="${PID_FILE}" airflow webserver -p 8090 >> "${LOG_FILE}" 2>&1 &

The recommended approach is to create and enable the airflow webserver as a service. If you named the webserver as 'airflow-webserver', run the following command to restart the service:
systemctl restart airflow-webserver
You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Note: LightningFlow comes pre-integrated with all required libraries, Livy, custom operators, and local Spark cluster.
Link for AWS Marketplace: https://aws.amazon.com/marketplace/pp/Lightning-Analytics-Inc-LightningFlow-Integrated-o/B084BSD66V

Just by killing processes!!
Assuming the default airflow home directory is ~/airflow/
List the 3 parent processes running the airflow (PID):
cat ~/airflow/airflow-scheduler.pid
cat ~/airflow/airflow-webserver.pid
cat ~/airflow/airflow-webserver-monitor.pid
Get their PGID using:
ps -xjf
And finally run loop to kill all tree of each parent (PID):
for child in $(ps x -o "%P %p %r"| awk '{ if ( $1 == $your_first_PID || $3 == $your_first_PGID) { print $2 }}'); do kill $child; done

To restart Airflow you need to restart Airflow webserver and Airflow scheduler.
Check if Airflow servers are running:
ps -aux | grep airflow
if you see in list of running processes entries like:
ubuntu 49601 0.1 1.6 266668 135520 ? S 12:19 0:00 [ready] gunicorn: worker [airflow-webserver]
This means that Airflow webserver is running.
If you see entries like this:
ubuntu 49653 0.6 2.3 308912 187596 ? S 12:19 0:00 airflow scheduler -- DagFileProcessorManager
That means that Airflow scheduler is running.
Stop Airflow servers (webserver and scheduler):
pkill -f "airflow scheduler"
pkill -f "airflow webserver"
Now use again ps -aux | grep airflow to check if they are really shut down.
Start Airflow servers in background (daemon):
airflow webserver -D
airflow scheduler -D

Related

Django redis docker: port is already allocated [duplicate]

When I run docker-compose up in my Docker project it fails with the following message:
Error starting userland proxy: listen tcp 0.0.0.0:3000: bind: address already in use
netstat -pna | grep 3000
shows this:
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN -
I've already tried docker-compose down, but it doesn't help.
In your case it was some other process that was using the port and as indicated in the comments, sudo netstat -pna | grep 3000 helped you in solving the problem.
While in other cases (I myself encountered it many times) it mostly is the same container running at some other instance. In that case docker ps was very helpful as often I left the same containers running in other directories and then tried running again at other places, where same container names were used.
How docker ps helped me:
docker rm -f $(docker ps -aq) is a short command which I use to remove all containers.
Edit: Added how docker ps helped me.
This helped me:
docker-compose down # Stop container on current dir if there is a docker-compose.yml
docker rm -fv $(docker ps -aq) # Remove all containers
sudo lsof -i -P -n | grep <port number> # List who's using the port
and then:
kill -9 <process id> (macOS) or sudo kill <process id> (Linux).
Source: comment by user Rub21.
I had the same problem. I fixed this by stopping the Apache2 service on my host.
You can kill the process listening on that port easily with one command below :
kill -9 $(lsof -t -i tcp:<port#>)
ex :
kill -9 $(lsof -t -i tcp:<port#>)
or for ubuntu:
sudo kill -9 `sudo lsof -t -i:8000`
Man page for lsof : https://man7.org/linux/man-pages/man8/lsof.8.html
-9 is for hard kill without checking any deps.
(Not related, but might be useful if its PORT 5000 mystery) - the culprit process is due to Mac OS monterery.
The port 5000 is commonly used to serve local development servers. When updating to the latest macOS operating system, I was unable the docker to bind to port 5000, because it was already in use. (You may find a message along the lines of Port 5000 already in use.)
By running lsof -i :5000, I found out the process using the port was named ControlCenter, which is a native macOS application. If this is happening to you, even if you use brute force (and kill) the application, it will restart itself. In my laptop, lsof -i :5000 returns that Control Center is being used by process id 433. I could do killall -p 433, but macOS keeps restarting the process.
The process running on this port turns out to be an AirPlay server. You can deactivate it in
System Preferences › Sharing, and unchecking AirPlay Receiver to release port 5000.
I had same problem,
docker-compose down --rmi all (in the same directory where you run docker-compose up)
helps
UPD: CAUTION - this will also delete the local docker images you've pulled (from comment)
For Linux/Unix:
Simple search for linux utility using following command
netstat -nlp | grep 8888
It'll show processing running at this port, then kill that process using PID (look for a PID in row) of that process.
kill PID
In some cases it is critical to perform a more in-depth debugging to the problem before stopping a container or killing a process.
Consider following the checklist below:
1) Check you current docker compose environment
Run docker-compose ps. If port is in use by another container, stop it with docker-compose stop <service-name-in-compose-file> or remove it by replacing stop with rm.
2) Check the containers running outside your current workspace
Run docker ps to see list of all containers running under your host.
If you find the port is in use by another container, you can stop it with docker stop <container-id>.
(*) Because you're not under the scope of the origin compose environment - it is a good practice first to use docker inspect to gather more information about the container that you're about to stop.
3) Check if port is used by other processes running on the host
For example if the port is 6379 run:
$ sudo netstat -ltnp | grep ':6379'
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 915/redis-server 12
tcp6 0 0 ::1:6379 :::* LISTEN 915/redis-server 12
(*) You can also use the lsof command which is mainly used to retrieve information about files that are opened by various processes (I suggest running netstat before that).
So, In case of the output above the PID is 915. Now you can run:
$ ps j 915
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 915 915 915 ? -1 Ssl 123 0:11 /usr/bin/redis-server 127.0.0.1:6379
And see the ID of the parent process (PPID) and the execution command.
You can also run: $ pstree -s <PID> to a visual display of the process and its related processes.
In our case we can see that the process probably is a daemon (PPID is 1) - In that case consider running: A) $ cat /proc/<PID>/status in order to get a more in-depth information about the process like the number of threads spawned by the process, its capabilities, etc'.
B) $ systemctl status <PID> in order to see the systemd unit that caused the creation of a specific process. If the service is not critical - you can stop and disable the service.
4) Restart Docker service
Run: sudo service docker restart.
5) You reached this point and..
Only if its not placing your system at risk - consider restarting the server.
In my case it was
Error starting userland proxy: listen tcp 0.0.0.0:9000: bind: address already in use
And all that I need is turn off debug listening in php storm
Most probably this is because you are already running a web server on your host OS, so it conflicts with the web server that Docker is attempting to start.
So try this one-liner before trying anything else:
sudo service apache2 stop; sudo service nginx stop; sudo nginx -s stop;
I had apache running on my ubuntu machine. I used this command to kill it!
sudo /etc/init.d/apache2 stop
I was getting the below error when i was trying to launch a new container -
listen tcp 0.0.0.0:8080: bind: address already in use.
To check which process is running on port 8080, run below command:
netstat -tulnp | grep 8080
i got the output below
[root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]# netstat -tulnp | grep 8080 tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN **12749**/java [root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]#
run
kill -9 12749
Then try to relaunch the container it should work
If redis server is started as a service, it will restart itself when you using kill -9 <process_id> or sudo kill -9 `sudo lsof -t -i:<port_number>` . In that case you will need to stop the redis service using following command.
sudo service redis-server stop
I upgraded my docker this afternoon and ran into the same problem. I tried restarting docker but no luck.
Finally, I had to restart my computer and it worked. Definitely a bug.
Check docker-compose.yml, it might be the case that the port is specified twice.
version: '3'
services:
registry:
image: mysql:5.7
ports:
- "3306:3306" <--- remove either this line or next
- "127.0.0.1:3306:3306"
Changing network_mode: "bridge" to "host" did it for me.
This with
version: '2.2'
services:
bind:
image: sameersbn/bind:latest
dns: 127.0.0.1
ports:
- 172.17.42.1:53:53/udp
- 172.17.42.1:10000:10000
volumes:
- "/srv/docker/bind:/data"
environment:
- 'ROOT_PASSWORD=secret'
network_mode: "host"
I ran into the same issue several times. Restarting docker seems to do the trick
A variation of #DmitrySandalov's answer: I had tomcat/java running on 8080, which needed to keep going. Looked at the docker-compose.yml file and altered the entry for 8080 to another of my choosing.
nginx:
build: nginx
ports:
#- '8080:80' <-- original entry
- '8880:80'
- '8443:443'
Worked perfectly. (The only wrinkle is the change will be wiped if I ever update the project, since it's coming from an external repo.)
At first, make sure which service you are running in your specific port. In your case, you are already using port number 3000.
netstat -aof | findstr :3000
now stop that process which is running on specific port
lsof -i tcp:3000
I resolve the issue by restarting Docker.
It makes more sense to change the port of the docker update instead of shutting down other services that use port 80.
Just a side note if you have the same issue and is with Windows:
In my case the process in my way is just grafana-server.exe. Because I first downloaded the binary version and double click the executable, and it now starts as a service by user SYSTEM which I cannot taskkill (no permission)
I have to go to "Service manager" of Windows and search for service "Grafana", and stop it. After that port 3000 is no longer occupied.
Hope that helps.
The one that was using the port 8888 was Jupiter and I had to change the configuration file of Jupiter notebook to run on another port.
to list who is using that specific port.
sudo lsof -i -P -n | grep 9
You can specify the port you want Jupyter to run uncommenting/editing the following line in ~/.jupyter/jupyter_notebook_config.py:
c.NotebookApp.port = 9999
In case you don't have a jupyter_notebook_config.py try running jupyter notebook --generate-config. See this for further details on Jupyter configuration.
Before it was running on :docker run -d --name oracle -p 1521:1521 -p 5500:5500 qa/oracle
I just changed the port to docker run -d --name oracle -p 1522:1522 -p 5500:5500 qa/oracle
it worked fine for me !
On my machine a PID was not being shown from this command netstat -tulpn for the in-use port (8080), so i could not kill it, killing the containers and restarting the computer did not work. So service docker restart command restarted docker for me (ubuntu) and the port was no longer in use and i am a happy chap and off to lunch.
maybe it is too rude, but works for me. restart docker service itself
sudo service docker restart
hope it works for you also!
I have run the container with another port, like... 8082 :-)
I came across this problem. My simple solution is to remove the mongodb from the system
Commands to remove mongodb in Ubuntu:
sudo apt-get purge mongodb mongodb-clients mongodb-server mongodb-dev
sudo apt-get purge mongodb-10gen
sudo apt-get autoremove
Let me add one more case, because I had the same error and none of the solutions listed so far works:
serv1:
...
networks:
privnet:
ipv4_address: 10.10.100.2
...
serv2:
...
# no IP assignment, no dependencies
networks:
privnet:
ipam:
driver: default
config:
- subnet: 10.10.100.0/24
depending on the init order, serv2 may get assigned the IP 10.10.100.2 before serv1 is started, so I just assign IPs manually for all containers to avoid the error. Maybe there are other more elegant ways.
I have the same problem and by stopping docker container it was resolved.
sudo docker container stop <container-name>
i solved with this sudo service redis-server stop

Apache Graceful restart with Ansible

What is ideal ansible way to do a apache graceful restart?
- name: Restart Apache gracefully
command: apachectl -k graceful
Ansible systemd module does the same? If not, what is the difference? Thanks !
- name: Restart apache service.
systemd:
name: apache2
daemon_reload: yes
state: restarted
What you can do with Ansible is to ensure that all established connections to Apache are closed (drained in Ansible lingo).
Use the wait_for module with the condition to wait for drained connections on the particular host and port, with the state set to drained. See below:
- name: wait until apache2 connections are drained.
wait_for:
host: 0.0.0.0
port: 80
state: drained
Note: You can use this for all your Linux network services, which becomes very handy if you want to shutdown services in a particular order in your Ansible playbook.
The wait_for directive is useful to ensuring that Ansible does not run your playbook until specific steps are completed.
There is not support of graceful state at this moment in service or systemd modules because this is quite specific to certain services, status is limited to started, stopped, restarted reloaded and running.
So now you need to use a command module as you wrote in the question to perform a graceful restart, this is the only proper solution.
However there is an issue to support custom status, perhaps someone will implement that soon.
The documentation for the Ansible service module is not clearly stating what "reloaded" state does but, I found that for standard Red Hat 7 install using service module "reloaded" state results in a graceful restart.
I was led to this solution by this Server Fault QA
You can verify by getting process list of the httpd processes prior running your playbook which triggers your handler.
ps -ef | grep httpd | grep -v grep
After your playbook runs and handler reloaded state for httpd service shows "changed", re-examine the process list again.
You should see the start times for all the child httpd (non-root) processes have updated while the root owned parent process's start time has stayed the same.
If you also look in the error log you should see an entry containing:
"... configured -- resuming normal operations ... "
And, finally, you can see this by examining the output of systemctl status for the httpd.service and see the apachectl graceful option was called:
sudo systemctl status httpd.service
My handler now looks like:
- name: "{{ service_name }} restart handler"
become: yes
ansible.builtin.service:
service: "{{ service_name }}"
# state: restarted
state: reloaded

Thousands of extraneous gunicorn workers

I'm using gunicorn 19.7.1 appserver with nginx reverse proxy for a Django project (Ubuntu 14.04 machine).
ps aux | grep gunicorn | grep -v grep | wc -l yields 3043 at the moment.
Whereas in /etc/init/gunicorn.conf, I've always had -w 33. Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
How do I kill the extraneous workers?
How did this happen?
The worker count of 33 has always been properly configured on my busy production system.
However a few hours ago, I was trying python's multiprocessing on the server and things went south. Gunicorn workers ate up all the memory and took out the resident redis instances as well.
I reverted the change and have managed to get everything back online, except the memory hasn't been released and I've had to cope with these legacy gunicorn workers. What's going on?
Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
service only manages service-initiated processes, so if you started Gunicorn workers outside of the service framework, these workers will continue to live even if you stop.
How do I kill the extraneous workers?
The fast way:
Run this command to list all gunicorn process IDs and terminate them, and then restart Gunicorn:
$ pkill gunicorn
$ sudo service gunicorn start
The better way:
Identify your "desired" Gunicorn workers by finding the parent:
$ sudo service gunicorn status
Note the parent process ID. Let's say it's 123.
Save a list of all the "desired" workers' PIDs:
$ echo 123 > desired_workers
$ pgrep -P 123 >> desired_workers
Save a list of all workers' PIDs:
$ pgrep gunicorn > all_workers
Terminate the "undesired" workers:
$ cat desired_workers all_workers | sort | uniq -u | xargs kill

EC2, 16.04, Systemd, Supervisord, & Python

I have a service written in python 2.7 and managed by supervisord on an Ubuntu 16.04 EC2 spot instance.
On system startup I have a number of systemd tasks that need to take place and finish prior to supervisord starting the service.
When the instance is about to shutdown, I need supervisord to capture the event and tell the service to gracefully halt. The service will need to stop processing and return any workloads to the queue prior to exiting gracefully.
What would be the optimal way to manage system startup in this scenario?
What would be the optimal way to manage system shutdown in this scenario?
How do I best handle the interaction between supervisord and the service?
First, we need to install a systemd task that we want to run prior to supervisor starting up. Let's create a script, /usr/bin/pre-supervisor.sh, that will handle performing that work for us and create the /lib/systemd/system/pre-supervisor.service for systemd.
[Unit]
Description=Task to run prior to supervisor Starting up
After=cloud-init.service
Before=supervisor.service
Requires=cloud-init.service
[Service]
Type=oneshot
WorkingDirectory=/usr/bin
ExecStart=/usr/bin/pre-supervisor.sh
RemainAfterExit=no
TimeoutSec=90
User=ubuntu
# Output needs to appear in instance console output
StandardOutput=journal+console
[Install]
WantedBy=multi-user.target
As you can see, this will run after the ec2 cloud-init.service completes, and prior to the supervisor.service.
Next, let us modify the /lib/systemd/system/supervisor.service to run After the pres-supervisor.service completes, instead of after network.target.
[Unit]
Description=Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=pre-supervisor.service
[Service]
ExecStart=/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
ExecStop=/usr/bin/supervisorctl $OPTIONS shutdown
ExecReload=/usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf $OPTIONS reload
KillMode=process
Restart=on-failure
RestartSec=50s
[Install]
WantedBy=multi-user.target
That will ensure that our pre-supervisor tasks run prior to supervisor starting up.
Because these are spot instances, AWS has exposed the termination notice in the meta-data url, I simply need to inject something like:
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
into my python service, have it check every five seconds or so, and gracefully shutdown as soon as the termination notice appears.

How do I restart gunicorn hup , i dont know masterpid or location of PID file

I want to restart a Django server which is running using gunicorn.
I know how to use gunicorn in my system. But now I need to restart a remote server which is not set up by me.
I don't know masterpid to restart the server how can I get the masterPID.
Usually I HUP gunicorn with sudo kill -s HUP masterpid.
I tried with ps aux|grep gunicorn
and I did not find the gunicorn.pid file anywhere.
How can I get the masterpid?
the one liner below, gets the job perfectly done:
kill -HUP `ps -C gunicorn fch -o pid | head -n 1`
Explanation
pc -C gunicorn only lists the processes with gunicorn command, i.e., workers and master process. Workers are children of master as can be seen using ps -C gunicorn fc -o ppid,pid,cmd. We only need the pid of the master, therefore h flag is used to remove the first line which is PID text. Note that, f flag assures that master is printed above workers.
The correct procedure is to send HUP signal only to the master. In this way gunicorn is gracefully restarted, only the workers, not master, are recreated.
You can run gunicorn with option '-p', so you can get the pid of the master process from the pid file.
For example:
gunicorn -p app.pid your_app.wsgi.app
You can get the pid of the master by:
cat app.pid
This should also work to restart gunicorn:
ps aux |grep gunicorn |grep yourapp | awk '{ print $2 }' |xargs kill -HUP
Step 1:
Go to /etc/systemd/system/gunicorn.service and open file
add bellow line
PIDFile=/run/gunicorn/gunicorn.pid
--pid /run/gunicorn/gunicorn.pid
Example:
[Service]
PIDFile=/run/gunicorn/gunicorn.pid
WorkingDirectory=/home/django/django_project
ExecStart=/usr/bin/gunicorn --pid /run/gunicorn/gunicorn.pid --name=django_project.....
User=django
Group=django
Step 2:
Go to /etc/tmpfiles.d/ and create new file gunicorn.conf if not exist
add Bellow line
d /run/gunicorn 0755 django django -
where django = user and group name
Step 3:
Reboot your server or /etc/init.d/gunicorn restart to restart gunicorn to take effect
your pid file location is /run/gunicorn/gunicorn.pid check now..
Building on krizex's answer answer, when your master pid is stored in a file, you can gracefully reload your app in one command like this
$ cat app.pid |xargs kill -HUP
I would have liked to comment on the answer itself but I don't have enough reputation to comment yet 😢.