ubuntu rabbitmq - Error: unable to connect to node 'rabbit#somename: nodedown - django

I am using celery for django which needs rabbitmq. Some 4 or 5 months back, it used to work well. I again tried using it for a new project and got below error for rabbitmq while listing queues.
Listing queues ...
Error: unable to connect to node 'rabbit#somename': nodedown
diagnostics:
- nodes and their ports on 'somename': [{rabbitmqctl23014,44910}]
- current node: 'rabbitmqctl23014#somename'
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: XfMxei3DuB8GOZUm1vdUsg==
Whats the solution? If there is no good solution, can I uninstall and reinstall rabbitmq ?

I had installed rabbit as a service apparently and the
sudo rabbitmqctl force_reset
command was not working.
sudo service rabbitmq-server restart
Did exactly what I need.
P.S. I made sure I was the root user to do the previous command
sudo su

if you need change hostname:
sudo aptitude remove rabbitmq-server
sudo rm -fr /var/lib/rabbitmq/
set new hostname:
hostname newhost
in file /etc/hostname set new value hostname
add to file /etc/hosts
127.0.0.1 newhost
install rabbitmq:
sudo aptitude install rabbitmq-server
done

Check if the server is running by using this command:
sudo service rabbitmq-server status
If it says
Status of all running nodes...
Node 'rabbit#ubuntu' with Pid 26995:
running done.
It's running.
In my case, I accidentally ran the rabbitmqctl command with a different user and got the error you mentioned.
You might have installed it with root, try running
sudo rabbitmqctl stop_app
and see what the response is.
(If everything's fine, run
sudo rabbitmqctl start_app
afterwards).

Double check that your cookie hash file is the same
Double check that your machine name (uname) is the same as the one stated in your configuration — this one can be tricky
And double check that you start rabbitmq with the same user as the one you installed it. Just using 'sudo' won't do the trick.

Related

Django redis docker: port is already allocated [duplicate]

When I run docker-compose up in my Docker project it fails with the following message:
Error starting userland proxy: listen tcp 0.0.0.0:3000: bind: address already in use
netstat -pna | grep 3000
shows this:
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN -
I've already tried docker-compose down, but it doesn't help.
In your case it was some other process that was using the port and as indicated in the comments, sudo netstat -pna | grep 3000 helped you in solving the problem.
While in other cases (I myself encountered it many times) it mostly is the same container running at some other instance. In that case docker ps was very helpful as often I left the same containers running in other directories and then tried running again at other places, where same container names were used.
How docker ps helped me:
docker rm -f $(docker ps -aq) is a short command which I use to remove all containers.
Edit: Added how docker ps helped me.
This helped me:
docker-compose down # Stop container on current dir if there is a docker-compose.yml
docker rm -fv $(docker ps -aq) # Remove all containers
sudo lsof -i -P -n | grep <port number> # List who's using the port
and then:
kill -9 <process id> (macOS) or sudo kill <process id> (Linux).
Source: comment by user Rub21.
I had the same problem. I fixed this by stopping the Apache2 service on my host.
You can kill the process listening on that port easily with one command below :
kill -9 $(lsof -t -i tcp:<port#>)
ex :
kill -9 $(lsof -t -i tcp:<port#>)
or for ubuntu:
sudo kill -9 `sudo lsof -t -i:8000`
Man page for lsof : https://man7.org/linux/man-pages/man8/lsof.8.html
-9 is for hard kill without checking any deps.
(Not related, but might be useful if its PORT 5000 mystery) - the culprit process is due to Mac OS monterery.
The port 5000 is commonly used to serve local development servers. When updating to the latest macOS operating system, I was unable the docker to bind to port 5000, because it was already in use. (You may find a message along the lines of Port 5000 already in use.)
By running lsof -i :5000, I found out the process using the port was named ControlCenter, which is a native macOS application. If this is happening to you, even if you use brute force (and kill) the application, it will restart itself. In my laptop, lsof -i :5000 returns that Control Center is being used by process id 433. I could do killall -p 433, but macOS keeps restarting the process.
The process running on this port turns out to be an AirPlay server. You can deactivate it in
System Preferences › Sharing, and unchecking AirPlay Receiver to release port 5000.
I had same problem,
docker-compose down --rmi all (in the same directory where you run docker-compose up)
helps
UPD: CAUTION - this will also delete the local docker images you've pulled (from comment)
For Linux/Unix:
Simple search for linux utility using following command
netstat -nlp | grep 8888
It'll show processing running at this port, then kill that process using PID (look for a PID in row) of that process.
kill PID
In some cases it is critical to perform a more in-depth debugging to the problem before stopping a container or killing a process.
Consider following the checklist below:
1) Check you current docker compose environment
Run docker-compose ps. If port is in use by another container, stop it with docker-compose stop <service-name-in-compose-file> or remove it by replacing stop with rm.
2) Check the containers running outside your current workspace
Run docker ps to see list of all containers running under your host.
If you find the port is in use by another container, you can stop it with docker stop <container-id>.
(*) Because you're not under the scope of the origin compose environment - it is a good practice first to use docker inspect to gather more information about the container that you're about to stop.
3) Check if port is used by other processes running on the host
For example if the port is 6379 run:
$ sudo netstat -ltnp | grep ':6379'
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 915/redis-server 12
tcp6 0 0 ::1:6379 :::* LISTEN 915/redis-server 12
(*) You can also use the lsof command which is mainly used to retrieve information about files that are opened by various processes (I suggest running netstat before that).
So, In case of the output above the PID is 915. Now you can run:
$ ps j 915
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 915 915 915 ? -1 Ssl 123 0:11 /usr/bin/redis-server 127.0.0.1:6379
And see the ID of the parent process (PPID) and the execution command.
You can also run: $ pstree -s <PID> to a visual display of the process and its related processes.
In our case we can see that the process probably is a daemon (PPID is 1) - In that case consider running: A) $ cat /proc/<PID>/status in order to get a more in-depth information about the process like the number of threads spawned by the process, its capabilities, etc'.
B) $ systemctl status <PID> in order to see the systemd unit that caused the creation of a specific process. If the service is not critical - you can stop and disable the service.
4) Restart Docker service
Run: sudo service docker restart.
5) You reached this point and..
Only if its not placing your system at risk - consider restarting the server.
In my case it was
Error starting userland proxy: listen tcp 0.0.0.0:9000: bind: address already in use
And all that I need is turn off debug listening in php storm
Most probably this is because you are already running a web server on your host OS, so it conflicts with the web server that Docker is attempting to start.
So try this one-liner before trying anything else:
sudo service apache2 stop; sudo service nginx stop; sudo nginx -s stop;
I had apache running on my ubuntu machine. I used this command to kill it!
sudo /etc/init.d/apache2 stop
I was getting the below error when i was trying to launch a new container -
listen tcp 0.0.0.0:8080: bind: address already in use.
To check which process is running on port 8080, run below command:
netstat -tulnp | grep 8080
i got the output below
[root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]# netstat -tulnp | grep 8080 tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN **12749**/java [root#ip-112-x6x-2x-xxx.xxxxx.compute.internal (aws_main) ~]#
run
kill -9 12749
Then try to relaunch the container it should work
If redis server is started as a service, it will restart itself when you using kill -9 <process_id> or sudo kill -9 `sudo lsof -t -i:<port_number>` . In that case you will need to stop the redis service using following command.
sudo service redis-server stop
I upgraded my docker this afternoon and ran into the same problem. I tried restarting docker but no luck.
Finally, I had to restart my computer and it worked. Definitely a bug.
Check docker-compose.yml, it might be the case that the port is specified twice.
version: '3'
services:
registry:
image: mysql:5.7
ports:
- "3306:3306" <--- remove either this line or next
- "127.0.0.1:3306:3306"
Changing network_mode: "bridge" to "host" did it for me.
This with
version: '2.2'
services:
bind:
image: sameersbn/bind:latest
dns: 127.0.0.1
ports:
- 172.17.42.1:53:53/udp
- 172.17.42.1:10000:10000
volumes:
- "/srv/docker/bind:/data"
environment:
- 'ROOT_PASSWORD=secret'
network_mode: "host"
I ran into the same issue several times. Restarting docker seems to do the trick
A variation of #DmitrySandalov's answer: I had tomcat/java running on 8080, which needed to keep going. Looked at the docker-compose.yml file and altered the entry for 8080 to another of my choosing.
nginx:
build: nginx
ports:
#- '8080:80' <-- original entry
- '8880:80'
- '8443:443'
Worked perfectly. (The only wrinkle is the change will be wiped if I ever update the project, since it's coming from an external repo.)
At first, make sure which service you are running in your specific port. In your case, you are already using port number 3000.
netstat -aof | findstr :3000
now stop that process which is running on specific port
lsof -i tcp:3000
I resolve the issue by restarting Docker.
It makes more sense to change the port of the docker update instead of shutting down other services that use port 80.
Just a side note if you have the same issue and is with Windows:
In my case the process in my way is just grafana-server.exe. Because I first downloaded the binary version and double click the executable, and it now starts as a service by user SYSTEM which I cannot taskkill (no permission)
I have to go to "Service manager" of Windows and search for service "Grafana", and stop it. After that port 3000 is no longer occupied.
Hope that helps.
The one that was using the port 8888 was Jupiter and I had to change the configuration file of Jupiter notebook to run on another port.
to list who is using that specific port.
sudo lsof -i -P -n | grep 9
You can specify the port you want Jupyter to run uncommenting/editing the following line in ~/.jupyter/jupyter_notebook_config.py:
c.NotebookApp.port = 9999
In case you don't have a jupyter_notebook_config.py try running jupyter notebook --generate-config. See this for further details on Jupyter configuration.
Before it was running on :docker run -d --name oracle -p 1521:1521 -p 5500:5500 qa/oracle
I just changed the port to docker run -d --name oracle -p 1522:1522 -p 5500:5500 qa/oracle
it worked fine for me !
On my machine a PID was not being shown from this command netstat -tulpn for the in-use port (8080), so i could not kill it, killing the containers and restarting the computer did not work. So service docker restart command restarted docker for me (ubuntu) and the port was no longer in use and i am a happy chap and off to lunch.
maybe it is too rude, but works for me. restart docker service itself
sudo service docker restart
hope it works for you also!
I have run the container with another port, like... 8082 :-)
I came across this problem. My simple solution is to remove the mongodb from the system
Commands to remove mongodb in Ubuntu:
sudo apt-get purge mongodb mongodb-clients mongodb-server mongodb-dev
sudo apt-get purge mongodb-10gen
sudo apt-get autoremove
Let me add one more case, because I had the same error and none of the solutions listed so far works:
serv1:
...
networks:
privnet:
ipv4_address: 10.10.100.2
...
serv2:
...
# no IP assignment, no dependencies
networks:
privnet:
ipam:
driver: default
config:
- subnet: 10.10.100.0/24
depending on the init order, serv2 may get assigned the IP 10.10.100.2 before serv1 is started, so I just assign IPs manually for all containers to avoid the error. Maybe there are other more elegant ways.
I have the same problem and by stopping docker container it was resolved.
sudo docker container stop <container-name>
i solved with this sudo service redis-server stop

How to configure Cassandra in GCP to remotely connect?

I am following the below steps to install and configure Cassandra in GCP.
It works perfectly as long as working with Cassandra within GCP.
$java -version
$echo "deb http://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
$curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
$sudo apt install apt-transport-https
$sudo apt-get update
$sudo apt-get install cassandra
$sudo systemctl status cassandra
//Active: active (running)
$nodetool status
//Datacenter: datacenter1
$tail -f /var/log/cassandra/system.log
$find /usr/lib/ -name cqlshlib
##/usr/lib/python3/dist-packages/cqlshlib
$export PYTHONPATH=/usr/lib/python3/dist-packages
$sudo nano ~/.bashrc
//Add
export PYTHONPATH=/usr/lib/python3/dist-packages
//save
$source ~/.bashrc
$python --version
$cqlsh
//it opens cqlsh shell
But I want to configure Cassandra to remotely connect.
I tried the following 7 different solutions.
But still I am getting the error.
1.In GCP,
VPC network -> firewall -> create
IP 0.0.0.0/0
port tcp=9000,9042,8088,9870,8123,8020, udp=9000
tag = hadoop
Add this tag in VMs
2.rm -Rf ~/.cassandra
3.sudo nano ~/.cassandra/cqlshrc
[connection]
hostname = 34.72.70.173
port = 9042
4. cqlsh 34.72.70.173 -u cassandra -p cassandra
5. firewall - open ports
https://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used
9000,9042,8088,9870,8123,8020,7199,7000,7001,9160
6. Get rid of this line: JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=localhost"
Try restart the service: sudo service cassandra restart
If you have a cluster, make sure that ports 7000 and 9042 are open within your security group.
7. you can set the environment variable $CQLSH_HOST=1.2.3.4. Then simply type cqlsh.
https://stackoverflow.com/questions/20575640/datastax-devcenter-fails-to-connect-to-the-remote-cassandra-database/20598599#20598599
sudo nano /etc/cassandra/cassandra.yaml
listen_address: localhost
rpc_address: 34.72.70.173
broadcast_rpc_address: 34.72.70.173
sudo service cassandra restart
sudo nano ~/.bashrc
export CQLSH_HOST=34.72.70.173
source ~/.bashrc
sudo systemctl restart cassandra
sudo service cassandra restart
sudo systemctl status cassandra
nodetool status
Please suggest how to get rid of the following error
Connection error: ('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionRefusedE
rror(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
This indicates that when you ran cqlsh, you didn't specify the public IP:
Connection error: ('Unable to connect to any servers', \
{'127.0.0.1:9042': ConnectionRefusedError(111, "Tried connecting to [('127.0.0.1', 9042)]. \
Last error: Connection refused")})
When running Cassandra nodes on public clouds, you need to configure cassandra.yaml with the following:
listen_address: private_IP
rpc_addpress: public_IP
The listen address is the what Cassandra nodes use for communicating with each other privately, e.g. gossip protocol.
The RPC address is what clients/apps/drivers use to connect to nodes on the CQL port (9042) so it needs to be set to the nodes' public IP address.
To connect to a node with cqlsh (a client), you need to specify the node's public IP:
$ cqlsh <public_IP>
Cheers!

Django Application running on Ubuntu VPS: This site can’t be reached

I am running an ubuntu 16.04 cloud VPS server. I've set up a venv and activated it, and installed django.
I run the server with
python3 manage.py runserver 0.0.0.0:8000
I am trying to access this application from a remote computer (not inside the same LAN); I'm trying to make the application visible to the world outside the VPS and VPLAN. When I try to access the site in my home computer broswer like: xx.xx.xxx.xxx:8000 I get the error:
This site can’t be reached. http://xx.xx.xxx.xxx:8000/ is unreachable.
Now I've tried a traceroute and it seems to reach the server ok. I also did
sudo ufw enable
sudo ufw 8000 allow
sudo iptables -S | grep 8000 (and see the proper entries)
In the settings file I have:
ALLOWED_HOSTS = ["*", "0.0.0.0", "localhost", "xx.xx.xxx.xxx","xxx.temporary.link"]
If I wget localhost:8000 I get a response fine. I have tried doing all of the above as root and as another dedicated user but it makes no difference.
I ran through this guide
https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-16-04
and I still have the same issue.
Does anyone have any other ideas? Thanks in advance
Try:
sudo ufw allow 8000
Not:
sudo ufw 8000 allow

Postgresql 9.3 on ubuntu for Django web application Error: cluster_port_ready: could not find psql binary

I'm having some issues with postgresql 9.3 running on ubuntu 14.04.5
I had a django web application running using NGINX/Gunicorn, for a few weeks, then one day I went to use it and I got an OperationalError:
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
After researching, it became apparent that the postgresql server wasn't running. I then attempted to start it, and got the following error:
* No PostgreSQL clusters exist; see "man pg_createcluster"
Alarming, but my data is backed up just fine, so I did:
pg_createcluster 9.3 main --start
and then I got:
Error: cluster configuration already exists
No problem. I'll drop it and create a new one, I thought:
pg_dropcluster 9.3 main --stop
that ran fine, and so I ran:
pg_createcluster 9.3 main --start
again, and now it created the cluster apparently, but would not start:
Creating new cluster 9.3/main ...
config /etc/postgresql/9.3/main
data /var/lib/postgresql/9.3/main
locale en_US.UTF-8
port 5432
Error: cluster_port_ready: could not find psql binary
does anyone have any advice as to how to address this? I have done apt-get remove and install for postgresql, and still same results.
Thanks in advance!
UPDATE:
So, I tried the suggestion below to run the following:
/usr/lib/postgresql/9.3/bin/initdb -D /var/lib/postgresql/data -W -A md5
which returned:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
Enter new superuser password:
Enter it again:
setting password ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/data
or
/usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/data -l logfile start
So I ran:
user$ /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgres
ql/data
and got:
LOG: could not bind IPv6 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for "localhost"
FATAL: could not create any TCP/IP sockets
So I was unable to recover the postgres instance I was trying to run, so I used this stack overflow article on
How to purge and thoroughly uninstall postgres
. As far as why this happened, I am still at a loss. I read that my issue could have been caused by my locale environment variables for LANGUAGE, and LANG_ALL. Or it's possible that this could have been caused after an update because I didn't specify what version of postgres I wanted to install when I ran apt-get initially. Regardless, here are the commands I ran that got me back into business:
apt-get --purge remove postgresql\*
After this command, I got some errors complaining about an invalid data directory again, so on the advice of a post that's not marked as the answer from the link above, I also ran:
apt-get autoremove postgresql*
That seemed to go well, so I returned to the instructions as laid out by the accepted answer:
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres
That appeared to complete without errors, except that the postgres user had already been removed by the previous commands..
I then ensured that my locale environment variables were set, based on an a related article titled: Solving "no PostgresSQL Clusters found" error
I ran:
#dpkg-reconfigure locales
Once that was all finished, I did a fresh install, this time specifying the version number:
apt-get install postgresql-9.3 postgresql-contrib-9.3 postgresql-doc-9.3
after the install completed postgres started automatically, and it seems to be fine now.
create postgresql database cluster:
$ sudo su
# mkdir /var/lib/postgresql/data
# chown -R postgres:postgres /var/lib/postgresql/data
# su postgres
$ /usr/lib/postgresql/9.3/bin/initdb -D /var/lib/postgresql/data -W -A md5
If this error messages:
LOG: could not bind IPv4 socket: Address already in use
It mean postgresql already running, you need to restart it.
You can restart service postgresql in ubuntu 14.04 with this command:
$ sudo sh /etc/init.d/postgresql restart
In new Ubuntu with systemd, to restart postgresql with this command:
$ sudo systemctl restart postgresql

Django Celery cannot connect to remote RabbitMQ on EC2

I created a rabbitmq cluster on two instances on EC2. My django app uses celery for async tasks which in turn uses RabbitMQ for message queue.
Whenever I start celery with the command:
python manage.py celery worker --loglevel=INFO
OR
python manage.py celeryd --loglevel=INFO
I keep getting following error message related to remote RabbitMQ:
[2015-05-19 08:58:47,307: ERROR/MainProcess] consumer: Cannot connect to amqp://myuser:**#<ip-address>:25672/myvhost/: Socket closed.
Trying again in 2.00 seconds...
I set permissions using:
sudo rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"
and then restarted rabbitmq-server on both the cluster nodes. However, it didn't help.
In log file, I see few entries like below:
=INFO REPORT==== 19-May-2015::08:14:41 ===
accepting AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672)
=ERROR REPORT==== 19-May-2015::08:14:44 ===
closing AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672):
{handshake_error,opening,0,
{amqp_error,access_refused,
"access to vhost 'myvhost' refused for user 'myuser'",
'connection.open'}}
The file /usr/local/etc/rabbitmq/rabbitmq-env.conf contains an entry for NODE_IP_ADDRESS to bind it only to localhost. Removing the NODE_IP_ADDRESS entry from the config binds the port to all network inferfaces.
Source: https://superuser.com/questions/464311/open-port-5672-tcp-for-access-to-rabbitmq-on-mac
Turns out I had not created appropriate configuration files. In my case (Ubuntu 14.04), I had to create below two configuration files:
$ cat /etc/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODE_IP_ADDRESS=<ip_of_ec2_instance>
<ip_of_ec2_instance> has to be the internal IP that EC2 uses. Not the public IP that one uses to ssh into the instance. It can be obtained using ip a command.
$ cat /etc/rabbitmq/rabbitmq.config
[
{mnesia, [{dump_log_write_threshold, 1000}]},
{rabbit, [{tcp_listeners, [25672]}]},
{rabbit, [{loopback_users, []}]}
].
I think the line {rabbit, [{tcp_listeners, [25672]}]}, was one of the most important piece of configuration that I was missing.
Thanks #dgil for the initial troubleshooting help.
The question has been answered. but just leaving notes with a similar issue i faced should anybody else find it useful
I have a flask app running on ec2 with amqp as a broker on port 5672 and ec2 elasticcache memcached as a backend. The amqp broker had trouble picking up tasks that were getting fired - so i resolved it by fixing as such
Assuming you have rabbitmq-server installed (sudo apt-get install rabbitmq-server), add the user and set the properties as such
sudo add_user username password
set_permissions username ".*" ".*" ".*"
restart server: sudo service rabbitmq-server restart
In your flask app for the celery configuration
broker_url=amqp://username:password#localhost:5672// (Set as above)
backend=cache+memcached://(ec2 cache url):11211/
(The cache+memcached:// tripped me up - without it i kept getting an import error (cannot import module)
Open up the port 5672 on your ec2 instance in the security group.
Now if you fire up your celery worker, it should pick up the the tasks that get fired and store the results on your memcached server