When I run docker-compose up in my Docker project it fails with the following message:
Error starting userland proxy: listen tcp 0.0.0.0:3000: bind: address already in use
netstat -pna | grep 3000
shows this:
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN -
I've already tried docker-compose down, but it doesn't help.
In your case it was some other process that was using the port and as indicated in the comments, sudo netstat -pna | grep 3000 helped you in solving the problem.
While in other cases (I myself encountered it many times) it mostly is the same container running at some other instance. In that case docker ps was very helpful as often I left the same containers running in other directories and then tried running again at other places, where same container names were used.
How docker ps helped me:
docker rm -f $(docker ps -aq) is a short command which I use to remove all containers.
Edit: Added how docker ps helped me.
This helped me:
docker-compose down # Stop container on current dir if there is a docker-compose.yml
docker rm -fv $(docker ps -aq) # Remove all containers
sudo lsof -i -P -n | grep <port number> # List who's using the port
and then:
kill -9 <process id> (macOS) or sudo kill <process id> (Linux).
Source: comment by user Rub21.
I had the same problem. I fixed this by stopping the Apache2 service on my host.
You can kill the process listening on that port easily with one command below :
kill -9 $(lsof -t -i tcp:<port#>)
ex :
kill -9 $(lsof -t -i tcp:<port#>)
or for ubuntu:
sudo kill -9 `sudo lsof -t -i:8000`
Man page for lsof : https://man7.org/linux/man-pages/man8/lsof.8.html
-9 is for hard kill without checking any deps.
(Not related, but might be useful if its PORT 5000 mystery) - the culprit process is due to Mac OS monterery.
The port 5000 is commonly used to serve local development servers. When updating to the latest macOS operating system, I was unable the docker to bind to port 5000, because it was already in use. (You may find a message along the lines of Port 5000 already in use.)
By running lsof -i :5000, I found out the process using the port was named ControlCenter, which is a native macOS application. If this is happening to you, even if you use brute force (and kill) the application, it will restart itself. In my laptop, lsof -i :5000 returns that Control Center is being used by process id 433. I could do killall -p 433, but macOS keeps restarting the process.
The process running on this port turns out to be an AirPlay server. You can deactivate it in
System Preferences › Sharing, and unchecking AirPlay Receiver to release port 5000.
I had same problem,
docker-compose down --rmi all (in the same directory where you run docker-compose up)
helps
UPD: CAUTION - this will also delete the local docker images you've pulled (from comment)
For Linux/Unix:
Simple search for linux utility using following command
netstat -nlp | grep 8888
It'll show processing running at this port, then kill that process using PID (look for a PID in row) of that process.
kill PID
In some cases it is critical to perform a more in-depth debugging to the problem before stopping a container or killing a process.
Consider following the checklist below:
1) Check you current docker compose environment
Run docker-compose ps. If port is in use by another container, stop it with docker-compose stop <service-name-in-compose-file> or remove it by replacing stop with rm.
2) Check the containers running outside your current workspace
Run docker ps to see list of all containers running under your host.
If you find the port is in use by another container, you can stop it with docker stop <container-id>.
(*) Because you're not under the scope of the origin compose environment - it is a good practice first to use docker inspect to gather more information about the container that you're about to stop.
3) Check if port is used by other processes running on the host
For example if the port is 6379 run:
$ sudo netstat -ltnp | grep ':6379'
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 915/redis-server 12
tcp6 0 0 ::1:6379 :::* LISTEN 915/redis-server 12
(*) You can also use the lsof command which is mainly used to retrieve information about files that are opened by various processes (I suggest running netstat before that).
So, In case of the output above the PID is 915. Now you can run:
$ ps j 915
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 915 915 915 ? -1 Ssl 123 0:11 /usr/bin/redis-server 127.0.0.1:6379
And see the ID of the parent process (PPID) and the execution command.
You can also run: $ pstree -s <PID> to a visual display of the process and its related processes.
In our case we can see that the process probably is a daemon (PPID is 1) - In that case consider running: A) $ cat /proc/<PID>/status in order to get a more in-depth information about the process like the number of threads spawned by the process, its capabilities, etc'.
B) $ systemctl status <PID> in order to see the systemd unit that caused the creation of a specific process. If the service is not critical - you can stop and disable the service.
4) Restart Docker service
Run: sudo service docker restart.
5) You reached this point and..
Only if its not placing your system at risk - consider restarting the server.
In my case it was
Error starting userland proxy: listen tcp 0.0.0.0:9000: bind: address already in use
And all that I need is turn off debug listening in php storm
Most probably this is because you are already running a web server on your host OS, so it conflicts with the web server that Docker is attempting to start.
So try this one-liner before trying anything else:
sudo service apache2 stop; sudo service nginx stop; sudo nginx -s stop;
I had apache running on my ubuntu machine. I used this command to kill it!
sudo /etc/init.d/apache2 stop
I was getting the below error when i was trying to launch a new container -
listen tcp 0.0.0.0:8080: bind: address already in use.
To check which process is running on port 8080, run below command:
netstat -tulnp | grep 8080
i got the output below
[root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]# netstat -tulnp | grep 8080 tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN **12749**/java [root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]#
run
kill -9 12749
Then try to relaunch the container it should work
If redis server is started as a service, it will restart itself when you using kill -9 <process_id> or sudo kill -9 `sudo lsof -t -i:<port_number>` . In that case you will need to stop the redis service using following command.
sudo service redis-server stop
I upgraded my docker this afternoon and ran into the same problem. I tried restarting docker but no luck.
Finally, I had to restart my computer and it worked. Definitely a bug.
Check docker-compose.yml, it might be the case that the port is specified twice.
version: '3'
services:
registry:
image: mysql:5.7
ports:
- "3306:3306" <--- remove either this line or next
- "127.0.0.1:3306:3306"
Changing network_mode: "bridge" to "host" did it for me.
This with
version: '2.2'
services:
bind:
image: sameersbn/bind:latest
dns: 127.0.0.1
ports:
- 172.17.42.1:53:53/udp
- 172.17.42.1:10000:10000
volumes:
- "/srv/docker/bind:/data"
environment:
- 'ROOT_PASSWORD=secret'
network_mode: "host"
I ran into the same issue several times. Restarting docker seems to do the trick
A variation of #DmitrySandalov's answer: I had tomcat/java running on 8080, which needed to keep going. Looked at the docker-compose.yml file and altered the entry for 8080 to another of my choosing.
nginx:
build: nginx
ports:
#- '8080:80' <-- original entry
- '8880:80'
- '8443:443'
Worked perfectly. (The only wrinkle is the change will be wiped if I ever update the project, since it's coming from an external repo.)
At first, make sure which service you are running in your specific port. In your case, you are already using port number 3000.
netstat -aof | findstr :3000
now stop that process which is running on specific port
lsof -i tcp:3000
I resolve the issue by restarting Docker.
It makes more sense to change the port of the docker update instead of shutting down other services that use port 80.
Just a side note if you have the same issue and is with Windows:
In my case the process in my way is just grafana-server.exe. Because I first downloaded the binary version and double click the executable, and it now starts as a service by user SYSTEM which I cannot taskkill (no permission)
I have to go to "Service manager" of Windows and search for service "Grafana", and stop it. After that port 3000 is no longer occupied.
Hope that helps.
The one that was using the port 8888 was Jupiter and I had to change the configuration file of Jupiter notebook to run on another port.
to list who is using that specific port.
sudo lsof -i -P -n | grep 9
You can specify the port you want Jupyter to run uncommenting/editing the following line in ~/.jupyter/jupyter_notebook_config.py:
c.NotebookApp.port = 9999
In case you don't have a jupyter_notebook_config.py try running jupyter notebook --generate-config. See this for further details on Jupyter configuration.
Before it was running on :docker run -d --name oracle -p 1521:1521 -p 5500:5500 qa/oracle
I just changed the port to docker run -d --name oracle -p 1522:1522 -p 5500:5500 qa/oracle
it worked fine for me !
On my machine a PID was not being shown from this command netstat -tulpn for the in-use port (8080), so i could not kill it, killing the containers and restarting the computer did not work. So service docker restart command restarted docker for me (ubuntu) and the port was no longer in use and i am a happy chap and off to lunch.
maybe it is too rude, but works for me. restart docker service itself
sudo service docker restart
hope it works for you also!
I have run the container with another port, like... 8082 :-)
I came across this problem. My simple solution is to remove the mongodb from the system
Commands to remove mongodb in Ubuntu:
sudo apt-get purge mongodb mongodb-clients mongodb-server mongodb-dev
sudo apt-get purge mongodb-10gen
sudo apt-get autoremove
Let me add one more case, because I had the same error and none of the solutions listed so far works:
serv1:
...
networks:
privnet:
ipv4_address: 10.10.100.2
...
serv2:
...
# no IP assignment, no dependencies
networks:
privnet:
ipam:
driver: default
config:
- subnet: 10.10.100.0/24
depending on the init order, serv2 may get assigned the IP 10.10.100.2 before serv1 is started, so I just assign IPs manually for all containers to avoid the error. Maybe there are other more elegant ways.
I have the same problem and by stopping docker container it was resolved.
sudo docker container stop <container-name>
i solved with this sudo service redis-server stop
Having this Dockerfile:
FROM fedora:30
ENV LANG C.UTF-8
RUN dnf upgrade -y \
&& dnf install -y \
openssh-clients \
openvpn \
slirp4netns \
&& dnf clean all
CMD ["openvpn", "--config", "/vpn/ovpn.config", "--auth-user-pass", "/vpn/ovpn.auth"]
Building the image with:
podman build -t peque/vpn .
If I try to run it with (note $(pwd), where the VPN configuration and credentials are stored):
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
I get the following error:
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Any ideas on how could I fix this? I would not mind changing the base image if that could help (i.e.: to Alpine or anything else as long as it allows me to use openvpn for the connection).
System information
Using Podman 1.4.4 (rootless) and Fedora 30 distribution with kernel 5.1.19.
/dev/net/tun permissions
Running the container with:
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then, from the container, I can:
# ls -l /dev/ | grep net
drwxr-xr-x. 2 root root 60 Jul 23 07:31 net
I can also list /dev/net, but will get a "permission denied error":
# ls -l /dev/net
ls: cannot access '/dev/net/tun': Permission denied
total 0
-????????? ? ? ? ? ? tun
Trying --privileged
If I try with --privileged:
podman run -v $(pwd):/vpn:Z --privileged --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then instead of the permission-denied error (errno=13), I get a no-such-file-or-directory error (errno=2):
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
I can effectively verify there is no /dev/net/ directory when using --privileged, even if I pass the --cap-add=NET_ADMIN --device=/dev/net/tun parameters.
Verbose log
This is the log I get when configuring the client with verb 3:
OpenVPN 2.4.7 x86_64-redhat-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
library versions: OpenSSL 1.1.1c FIPS 28 May 2019, LZO 2.08
Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
TCP/UDP: Preserving recently used remote address: [AF_INET]xx.xx.xx.xx:1194
Socket Buffers: R=[212992->212992] S=[212992->212992]
UDP link local (bound): [AF_INET][undef]:0
UDP link remote: [AF_INET]xx.xx.xx.xx:1194
TLS: Initial packet from [AF_INET]xx.xx.xx.xx:1194, sid=3ebc16fc 8cb6d6b1
WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
VERIFY OK: depth=1, C=ES, ST=XXX, L=XXX, O=XXXXX, emailAddress=email#domain.com, CN=internal-ca
VERIFY KU OK
Validating certificate extended key usage
++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
VERIFY EKU OK
VERIFY OK: depth=0, C=ES, ST=XXX, L=XXX, O=XXXXX, emailAddress=email#domain.com, CN=ovpn.server.address
Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
[ovpn.server.address] Peer Connection Initiated with [AF_INET]xx.xx.xx.xx:1194
SENT CONTROL [ovpn.server.address]: 'PUSH_REQUEST' (status=1)
PUSH: Received control message: 'PUSH_REPLY,route xx.xx.xx.xx 255.255.255.0,route xx.xx.xx.0 255.255.255.0,dhcp-option DOMAIN server.net,dhcp-option DNS xx.xx.xx.254,dhcp-option DNS xx.xx.xx.1,dhcp-option DNS xx.xx.xx.1,route-gateway xx.xx.xx.1,topology subnet,ping 10,ping-restart 60,ifconfig xx.xx.xx.24 255.255.255.0,peer-id 1'
OPTIONS IMPORT: timers and/or timeouts modified
OPTIONS IMPORT: --ifconfig/up options modified
OPTIONS IMPORT: route options modified
OPTIONS IMPORT: route-related options modified
OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
OPTIONS IMPORT: peer-id set
OPTIONS IMPORT: adjusting link_mtu to 1624
Outgoing Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Outgoing Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Incoming Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
ROUTE_GATEWAY xx.xx.xx.xx/255.255.255.0 IFACE=tap0 HWADDR=0a:38:ba:e6:4b:5f
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Exiting due to fatal error
Error number may change depending on whether I run the command with --privileged or not.
It turns out that you are blocked by SELinux: after running the client container and trying to access /dev/net/tun inside it, you will get the following AVC denial in the audit log:
type=AVC msg=audit(1563869264.270:833): avc: denied { getattr } for pid=11429 comm="ls" path="/dev/net/tun" dev="devtmpfs" ino=15236 scontext=system_u:system_r:container_t:s0:c502,c803 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=0
To allow your container configuring the tunnel while staying not fully privileged and with SELinux enforced, you need to customize SELinux policies a bit. However, I did not find an easy way to do this properly.
Luckily, there is a tool called udica, which can generate SELinux policies from container configurations. It does not provide the desired policy on its own and requires some manual intervention, so I will describe how I got the openvpn container working step-by-step.
First, install the required tools:
$ sudo dnf install policycoreutils-python-utils policycoreutils udica
Create the container with required privileges, then generate the policy for this container:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --name ovpn peque/vpn
$ podman inspect ovpn | sudo udica -j - ovpn_container
Policy ovpn_container created!
Please load these modules using:
# semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
Restart the container with: "--security-opt label=type:ovpn_container.process" parameter
Here is the policy which was generated by udica:
$ cat ovpn_container.cil
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
)
Let's try this policy (note the --security-opt option, which tells podman to run the container in newly created domain):
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
<...>
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Ugh. Here is the problem: the policy generated by udica still does not know about specific requirements of our container, as they are not reflected in its configuration (well, probably, it is possible to infer that you want to allow operations on tun_tap_device_t based on the fact that you requested --device /dev/net/tun, but...). So, we need to customize the policy by extending it with few more statements.
Let's disable SELinux temporarily and run the container to collect the expected denials:
$ sudo setenforce 0
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
These are:
$ sudo grep denied /var/log/audit/audit.log
type=AVC msg=audit(1563889218.937:839): avc: denied { read write } for pid=3272 comm="openvpn" name="tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:840): avc: denied { open } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:841): avc: denied { ioctl } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 ioctlcmd=0x54ca scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.947:842): avc: denied { nlmsg_write } for pid=3273 comm="ip" scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tclass=netlink_route_socket permissive=1
Or more human-readable:
$ sudo grep denied /var/log/audit/audit.log | audit2allow
#============= ovpn_container.process ==============
allow ovpn_container.process self:netlink_route_socket nlmsg_write;
allow ovpn_container.process tun_tap_device_t:chr_file { ioctl open read write };
OK, let's modify the udica-generated policy by adding the advised allows to it (note, that here I manually translated the syntax to CIL):
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
; This is our new stuff.
(allow process tun_tap_device_t ( chr_file ( ioctl open read write )))
(allow process self ( netlink_route_socket ( nlmsg_write )))
)
Now we enable SELinux back, reload the module and check that the container works correctly when we specify our custom domain:
$ sudo setenforce 1
$ sudo semodule -r ovpn_container
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
<...>
Initialization Sequence Completed
Finally, check that other containers still have no these privileges:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z peque/vpn
<...>
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Yay! We stay with SELinux on, and allow the tunnel configuration only to our specific container.
I'm attempting to create a ssh tunnel, when deploying an application to aws beanstalk. I want to put the tunnel as a background process, that is always connected on application deploy. The script is hanging forever on the deployment and I can't see why.
"/home/ec2-user/eclair-ssh-tunnel.sh":
mode: "000500" # u+rx
owner: root
group: root
content: |
cd /root
eval $(ssh-agent -s)
DISPLAY=":0.0" SSH_ASKPASS="./askpass_script" ssh-add eclair-test-key </dev/null
# we want this command to keep running in the backgriund
# so we add & at then end
nohup ssh -L 48682:localhost:8080 ubuntu#[host...] -N &
and here is the output I'm getting from /var/log/eb-activity.log:
[2019-06-14T14:53:23.268Z] INFO [15615] - [Application update suredbits-api-root-0.37.0-testnet-ssh-tunnel-fix-port-9#30/AppDeployStage1/AppDeployPostHook/01_eclair-ssh-tunnel.sh] : Starting activity...
The ssh tunnel is spawned, and I can find it by doing:
[ec2-user#ip-172-31-25-154 ~]$ ps aux | grep 48682
root 16047 0.0 0.0 175560 6704 ? S 14:53 0:00 ssh -L 48682:localhost:8080 ubuntu#ec2-34-221-186-19.us-west-2.compute.amazonaws.com -N
If I kill that process, the deployment continues as expected, which indicates that the bug is in the tunnel script. I can't seem to find out where though.
You need to add -n option to ssh when run it in background to avoid reading from stdin.
The default on FS /dev/mapper/docker-XXX is 10GB. I followed other instructions to edit /etc/sysconfig/docker-storage and add --storage-opt dm.basesize=50G. Next I do:
sudo service docker restart
sudo service ecs restart
I can see
# ps -ef | grep docker | grep stor
root 5966 1 0 21:45 pts/0 00:00:01 /usr/bin/dockerd --default-ulimit nofile=1024:4096 --storage-driver devicemapper --storage-opt dm.basesize=50G --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.fs=ext4
So it looks like it took effect, however when I look into the running docker container it is stll 10GB:
# docker exec -it 601f6a9e9418 bash
root#601f6a9e9418:/# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/docker-202:1-263443-880571d796b21f307753d4f4ecca2141b85119985fac00001ea2622ce643b45f 10190136 7295128 2354336 76% /
Any help is greatly appreciated.
try this :
link : How to increase Docker container default size?
(optional) If you have already downloaded any image via docker pull you need to clean them first - otherwise they won't be resized
docker rmi your_image_name
Edit the storage config
vi /etc/sysconfig/docker-storage
There should be something like DOCKER_STORAGE_OPTIONS="...", change it to DOCKER_STORAGE_OPTIONS="... --storage-opt dm.basesize=100G"
Restart the docker deamon
service docker restart
Pull the image
docker pull your_image_name
(optional) verification
docker run -i -t your_image_name /bin/bash
df -h
I was struggling with this a lot until I found out this link http://www.projectatomic.io/blog/2016/03/daemon_option_basedevicesize/ turns out you have to remove/pull image after enlarging the basesize.
I'm using gunicorn 19.7.1 appserver with nginx reverse proxy for a Django project (Ubuntu 14.04 machine).
ps aux | grep gunicorn | grep -v grep | wc -l yields 3043 at the moment.
Whereas in /etc/init/gunicorn.conf, I've always had -w 33. Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
How do I kill the extraneous workers?
How did this happen?
The worker count of 33 has always been properly configured on my busy production system.
However a few hours ago, I was trying python's multiprocessing on the server and things went south. Gunicorn workers ate up all the memory and took out the resident redis instances as well.
I reverted the change and have managed to get everything back online, except the memory hasn't been released and I've had to cope with these legacy gunicorn workers. What's going on?
Yet these extra workers persist even if I do sudo service gunicorn stop and sudo service gunicorn start.
service only manages service-initiated processes, so if you started Gunicorn workers outside of the service framework, these workers will continue to live even if you stop.
How do I kill the extraneous workers?
The fast way:
Run this command to list all gunicorn process IDs and terminate them, and then restart Gunicorn:
$ pkill gunicorn
$ sudo service gunicorn start
The better way:
Identify your "desired" Gunicorn workers by finding the parent:
$ sudo service gunicorn status
Note the parent process ID. Let's say it's 123.
Save a list of all the "desired" workers' PIDs:
$ echo 123 > desired_workers
$ pgrep -P 123 >> desired_workers
Save a list of all workers' PIDs:
$ pgrep gunicorn > all_workers
Terminate the "undesired" workers:
$ cat desired_workers all_workers | sort | uniq -u | xargs kill