Django Celery cannot connect to remote RabbitMQ on EC2 - django

I created a rabbitmq cluster on two instances on EC2. My django app uses celery for async tasks which in turn uses RabbitMQ for message queue.
Whenever I start celery with the command:
python manage.py celery worker --loglevel=INFO
OR
python manage.py celeryd --loglevel=INFO
I keep getting following error message related to remote RabbitMQ:
[2015-05-19 08:58:47,307: ERROR/MainProcess] consumer: Cannot connect to amqp://myuser:**#<ip-address>:25672/myvhost/: Socket closed.
Trying again in 2.00 seconds...
I set permissions using:
sudo rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"
and then restarted rabbitmq-server on both the cluster nodes. However, it didn't help.
In log file, I see few entries like below:
=INFO REPORT==== 19-May-2015::08:14:41 ===
accepting AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672)
=ERROR REPORT==== 19-May-2015::08:14:44 ===
closing AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672):
{handshake_error,opening,0,
{amqp_error,access_refused,
"access to vhost 'myvhost' refused for user 'myuser'",
'connection.open'}}

The file /usr/local/etc/rabbitmq/rabbitmq-env.conf contains an entry for NODE_IP_ADDRESS to bind it only to localhost. Removing the NODE_IP_ADDRESS entry from the config binds the port to all network inferfaces.
Source: https://superuser.com/questions/464311/open-port-5672-tcp-for-access-to-rabbitmq-on-mac

Turns out I had not created appropriate configuration files. In my case (Ubuntu 14.04), I had to create below two configuration files:
$ cat /etc/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODE_IP_ADDRESS=<ip_of_ec2_instance>
<ip_of_ec2_instance> has to be the internal IP that EC2 uses. Not the public IP that one uses to ssh into the instance. It can be obtained using ip a command.
$ cat /etc/rabbitmq/rabbitmq.config
[
{mnesia, [{dump_log_write_threshold, 1000}]},
{rabbit, [{tcp_listeners, [25672]}]},
{rabbit, [{loopback_users, []}]}
].
I think the line {rabbit, [{tcp_listeners, [25672]}]}, was one of the most important piece of configuration that I was missing.
Thanks #dgil for the initial troubleshooting help.

The question has been answered. but just leaving notes with a similar issue i faced should anybody else find it useful
I have a flask app running on ec2 with amqp as a broker on port 5672 and ec2 elasticcache memcached as a backend. The amqp broker had trouble picking up tasks that were getting fired - so i resolved it by fixing as such
Assuming you have rabbitmq-server installed (sudo apt-get install rabbitmq-server), add the user and set the properties as such
sudo add_user username password
set_permissions username ".*" ".*" ".*"
restart server: sudo service rabbitmq-server restart
In your flask app for the celery configuration
broker_url=amqp://username:password#localhost:5672// (Set as above)
backend=cache+memcached://(ec2 cache url):11211/
(The cache+memcached:// tripped me up - without it i kept getting an import error (cannot import module)
Open up the port 5672 on your ec2 instance in the security group.
Now if you fire up your celery worker, it should pick up the the tasks that get fired and store the results on your memcached server

Related

Apache Graceful restart with Ansible

What is ideal ansible way to do a apache graceful restart?
- name: Restart Apache gracefully
command: apachectl -k graceful
Ansible systemd module does the same? If not, what is the difference? Thanks !
- name: Restart apache service.
systemd:
name: apache2
daemon_reload: yes
state: restarted
What you can do with Ansible is to ensure that all established connections to Apache are closed (drained in Ansible lingo).
Use the wait_for module with the condition to wait for drained connections on the particular host and port, with the state set to drained. See below:
- name: wait until apache2 connections are drained.
wait_for:
host: 0.0.0.0
port: 80
state: drained
Note: You can use this for all your Linux network services, which becomes very handy if you want to shutdown services in a particular order in your Ansible playbook.
The wait_for directive is useful to ensuring that Ansible does not run your playbook until specific steps are completed.
There is not support of graceful state at this moment in service or systemd modules because this is quite specific to certain services, status is limited to started, stopped, restarted reloaded and running.
So now you need to use a command module as you wrote in the question to perform a graceful restart, this is the only proper solution.
However there is an issue to support custom status, perhaps someone will implement that soon.
The documentation for the Ansible service module is not clearly stating what "reloaded" state does but, I found that for standard Red Hat 7 install using service module "reloaded" state results in a graceful restart.
I was led to this solution by this Server Fault QA
You can verify by getting process list of the httpd processes prior running your playbook which triggers your handler.
ps -ef | grep httpd | grep -v grep
After your playbook runs and handler reloaded state for httpd service shows "changed", re-examine the process list again.
You should see the start times for all the child httpd (non-root) processes have updated while the root owned parent process's start time has stayed the same.
If you also look in the error log you should see an entry containing:
"... configured -- resuming normal operations ... "
And, finally, you can see this by examining the output of systemctl status for the httpd.service and see the apachectl graceful option was called:
sudo systemctl status httpd.service
My handler now looks like:
- name: "{{ service_name }} restart handler"
become: yes
ansible.builtin.service:
service: "{{ service_name }}"
# state: restarted
state: reloaded

Cannot connect Celery to RabbitMQ on Windows Server

I am trying to setup rabbitMQ to use as a message broker for Celery. I am trying to set these up on a Windows Server 2012 R2. After I start the rabbitMQ server using the RabbitMQ start service on the applications menu, I try to start the celery app with the command.
celery -A proj worker -l info
I get the following error after the above command.
[2018-01-09 10:03:02,515: ERROR/MainProcess] consumer: Cannot connect to amqp://
guest:**#127.0.0.1:5672//: [WinError 10042] An unknown, invalid, or unsupported
option or level was specified in a getsockopt or setsockopt call.
Trying again in 2.00 seconds...
So, I tried debugging, by check the status of the RabbitMQ server, for which I went into the RabbitMQ command prompt and typed rabbitmqctl status, on which I got the following response.
These are the services that I used to start RabbitMQ and the RabbitMQ command line
Here's my Django settings for Celery. I tried putting ports and usernames before and after the hosts, but same error.
CELERY_BROKER_URL = 'amqp://localhost//'
CELERY_RESULT_BACKEND = 'amqp://localhost//'
What is the issue here? How do I check if the RabbitMQ service started or not? What setting do I need to put on the Django Settings file.
I was fighting the same issue. Ended up downgrading amqp to 2.1.3 based on the open issue in py-amqp:
https://github.com/celery/py-amqp/issues/130
Uninstall amqp using pip uninstall amqp
Install amqp using pip install -Iv amqp==2.1.3

Apache Spark - Connection refused for worker

Hi I was new to apache spark and i was trying to learn it
While creating a new standalone cluster I met with this error.
I started my master and it is active in port 7077, i can see that in the ui (port 8080)
While startting the server using the command
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.56:7077
I am meeting with a connection refused error
14/07/22 13:18:30 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker#node- physical:55124] -> [akka.tcp://sparkMaster#192.168.0.56:7077]: Error [Association failed with [akka.tcp://sparkMaster#192.168.0.56:7077]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster#192.168.0.56:7077]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.0.56:7077
Please help me with the error i am sruck here for a long time.
I hope the information is enough. Please help
In my case, I went to /etc/hosts and :
removed the line with 127.0.1.1 and it worked.
wrote "MASTER_IP MACHINE_NAME"
Try "./sbin/start-master -h ". It works, when I specify the host name as IP address.
Change the SPARK_MASTER_HOST=< ip> in the spark-env.sh of the master node.
Then restart the master, if you grep the process you will see it changes from
java -cp /spark/conf/:/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host < HOST NAME> --port 7077 --webui-port 8080
to
java -cp /spark/conf/:/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host < HOST IP> --port 7077 --webui-port 8080
Check if your firewall is turned off as it might be blocking the worker connection by either turning off the firewall temporarily:
$ sudo service iptables stop
or permanently:
$ sudo chkconfig iptables off
it seems like spark is very picky about IP and machine names. so, when starting your master, it will use your machine name to register spark master. if that name is not reachable from your workers, it will be almost impossible to reach.
a way to solve it, is to start your master like this:
SPARK_MASTER_IP=YOUR_SPARK_MASTER_IP ${SPARK_HOME}/sbin/start-master.sh
then, you will be able to connect your slaves like this
${SPARK_HOME}/sbin/start-slave.sh spark://YOUR_SPARK_MASTER_IP:PORT
i hope it helps!
did you add the entries of master and worker nodes in etc/hosts, if not add every machines ip and host name mappings in all the machines.
For Windows: spark-class org.apache.spark.deploy.master.Master -h [Interface IP to bind to]
I had the similar problem in a docker container, I solved it by setting the IP for master and driver as localhost, specifically:
set('spark.master.hostname' ,'localhost')
set('spark.driver.hostname', 'localhost')
I do not have a DNS and I added entries in /etc/hosts in the master node to refer to the IPs and hostnames of all master and worker nodes. In worker nodes, I added the IP and hostname of the master node in /etc/hosts.

Can't stop service in Vesta Control Panel

Hi everyone
I have a problem.
I stopped service named, exim and dovecot, but after a period of time, these services auto started again. Until now, I don't know why this happen even though, I was tried search for this issue but can't find out anything. please help me how to solve this problem..
Thank you so much!!!
This works for me:
Login as root on your server and force-stop the services:
service named stop
service exim stop
service dovecot stop
Next is to configure VestaCP to not start the services when the server is beeing rebooted:
chkconfig named off
chkconfig exim off
chkconfig dovecot off
And you're done. You can check by rebooting the server. You can also do this with other services:
clamd, spamassasin (if you installed the high ram VestaCP version and don't need the mail services)_
httpd, nginx, mysqld and vsftpd (for if you make a dns-only server)
You get the point, hope this works. Good luck
it just about when you create web domain, but have check the option DNS support and mail support. So, vesta will start service named and dovecot. you just ceate a cronjob with these command:
sudo /usr/local/vesta/bin/v-stop-service dovecot
sudo /usr/local/vesta/bin/v-stop-service named
sudo /usr/local/vesta/bin/v-stop-service exim
or, in the server, add these command line
JOB='8' MIN='0' HOUR='/6' DAY='' MONTH='' WDAY='' CMD='sudo /usr/local/vesta/bin/v-stop-service exim' SUSPENDED='no' TIME='12:32:31' DATE='2014-05-22'
JOB='9' MIN='0' HOUR='/6' DAY='' MONTH='' WDAY='' CMD='sudo /usr/local/vesta/bin/v-stop-service named' SUSPENDED='no' TIME='12:32:05' DATE='2014-05-22'
JOB='10' MIN='0' HOUR='/6' DAY='' MONTH='' WDAY='' CMD='sudo /usr/local/vesta/bin/v-stop-service dovecot' SUSPENDED='no' TIME='12:31:50' DATE='2014-05-22'
if you have any issue, give me message :)

ubuntu rabbitmq - Error: unable to connect to node 'rabbit#somename: nodedown

I am using celery for django which needs rabbitmq. Some 4 or 5 months back, it used to work well. I again tried using it for a new project and got below error for rabbitmq while listing queues.
Listing queues ...
Error: unable to connect to node 'rabbit#somename': nodedown
diagnostics:
- nodes and their ports on 'somename': [{rabbitmqctl23014,44910}]
- current node: 'rabbitmqctl23014#somename'
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: XfMxei3DuB8GOZUm1vdUsg==
Whats the solution? If there is no good solution, can I uninstall and reinstall rabbitmq ?
I had installed rabbit as a service apparently and the
sudo rabbitmqctl force_reset
command was not working.
sudo service rabbitmq-server restart
Did exactly what I need.
P.S. I made sure I was the root user to do the previous command
sudo su
if you need change hostname:
sudo aptitude remove rabbitmq-server
sudo rm -fr /var/lib/rabbitmq/
set new hostname:
hostname newhost
in file /etc/hostname set new value hostname
add to file /etc/hosts
127.0.0.1 newhost
install rabbitmq:
sudo aptitude install rabbitmq-server
done
Check if the server is running by using this command:
sudo service rabbitmq-server status
If it says
Status of all running nodes...
Node 'rabbit#ubuntu' with Pid 26995:
running done.
It's running.
In my case, I accidentally ran the rabbitmqctl command with a different user and got the error you mentioned.
You might have installed it with root, try running
sudo rabbitmqctl stop_app
and see what the response is.
(If everything's fine, run
sudo rabbitmqctl start_app
afterwards).
Double check that your cookie hash file is the same
Double check that your machine name (uname) is the same as the one stated in your configuration — this one can be tricky
And double check that you start rabbitmq with the same user as the one you installed it. Just using 'sudo' won't do the trick.