Using Ray from Flask - init() fails (with core dump) - flask

I'm trying to use Ray from a Flask web application.
The whole thing runs in Docker container.
Ray Version is 0.8.6, Flask 1.1.2
When I start the web application, Ray tries to init twice, at it seems, and then the processes crashes. I added the memory limitations later on because there where some warning regarding not enough shared memory size (docker compose setting is "shm_size: '4gb'").
If I start Ray in the same container without using Flask it runs well.
import os
import flask
import ray
from flask import Flask
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
app.config.from_mapping(
SECRET_KEY='dev',
DEBUG = True
)
# ensure the instance folder exists
try:
os.makedirs(app.instance_path)
except OSError:
pass
if ray.is_initialized() == False:
ray.init(ignore_reinit_error=True,
include_webui=False,
object_store_memory=1*1024*1014*1024,
redis_max_memory=2*1024*1014*1024)
ray.worker.global_worker.run_function_on_all_workers(setup_ray_logger)
#app.route('/api/GetAccountRatings', methods=['GET'])
def GetAccountRatings():
return ...
return app
When I start the flask web app with:
export FLASK_APP="mifad.api:create_app()"
export FLASK_ENV=development
flask run --host=0.0.0.0 --port=8084
I get the following error messages:
* Serving Flask app "mifad.api:create_app()" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://0.0.0.0:8084/ (Press CTRL+C to quit)
* Restarting with stat
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
* Debugger is active!
* Debugger PIN: 331-620-174
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
2020-07-06 07:38:10,382 INFO resource_spec.py:212 -- Starting Ray with 59.18 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,610 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:10,675 INFO resource_spec.py:212 -- Starting Ray with 59.13 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,781 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,043 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,479 ERROR import_thread.py:93 -- ImportThread: Error 111 connecting to 172.29.0.2:44946. Connection refused.
2020-07-06 07:38:11,481 ERROR worker.py:949 -- print_logs: Connection closed by server.
2020-07-06 07:38:11,488 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,899 ERROR import_thread.py:93 -- ImportThread: Error while reading from socket: (104, 'Connection reset by peer')
2020-07-06 07:38:11,901 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,908 ERROR worker.py:949 -- print_logs: Connection closed by server.
F0706 07:38:17.390182 4555 4659 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7ff84ae8061d google::LogMessage::Fail()
# 0x7ff84ae81a8c google::LogMessage::SendToLog()
# 0x7ff84ae802f9 google::LogMessage::Flush()
# 0x7ff84ae80511 google::LogMessage::~LogMessage()
# 0x7ff84ae5dde9 ray::RayLog::~RayLog()
# 0x7ff84ac39cea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7ff84ac39f37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7ff84ac6ffb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7ff84ac71da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7ff84ac4251d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7ff84ab96870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7ff84b0b80df boost::asio::detail::scheduler::do_run_one()
# 0x7ff84b0b8cf1 boost::asio::detail::scheduler::run()
# 0x7ff84b0b9c42 boost::asio::io_context::run()
# 0x7ff84ab7db10 ray::CoreWorker::RunIOService()
# 0x7ff84a7763e7 execute_native_thread_routine_compat
# 0x7ff84deed6db start_thread
# 0x7ff84dc1688f clone
F0706 07:38:17.804720 4553 4703 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7fedd65e261d google::LogMessage::Fail()
# 0x7fedd65e3a8c google::LogMessage::SendToLog()
# 0x7fedd65e22f9 google::LogMessage::Flush()
# 0x7fedd65e2511 google::LogMessage::~LogMessage()
# 0x7fedd65bfde9 ray::RayLog::~RayLog()
# 0x7fedd639bcea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7fedd639bf37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7fedd63d1fb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7fedd63d3da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7fedd63a451d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7fedd62f8870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7fedd681a0df boost::asio::detail::scheduler::do_run_one()
# 0x7fedd681acf1 boost::asio::detail::scheduler::run()
# 0x7fedd681bc42 boost::asio::io_context::run()
# 0x7fedd62dfb10 ray::CoreWorker::RunIOService()
# 0x7fedd5ed83e7 execute_native_thread_routine_compat
# 0x7fedd968f6db start_thread
# 0x7fedd93b888f clone
Aborted (core dumped)
What am I doing wrong?
Best regards,
Bernd

Related

Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used

I have a Flask app running on Heroku with uwsgi server in which each user connects to his own database. I have implemented the solution reported here for a very similar situation. In particular, I have implemented the connection registry as follows:
class DBSessionRegistry():
_registry = {}
def get(self, URI, **kwargs):
if URI not in self._registry:
current_app.logger.info(f'INFO - CREATING A NEW CONNECTION')
try:
engine = create_engine(URI,
echo=False,
pool_size=5,
max_overflow=5)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
a_session = Session()
self._registry[URI] = a_session
except ArgumentError:
raise Exception('Error')
current_app.logger.info(f'SESSION ID: {id(self._registry[URI])}')
current_app.logger.info(f'REGISTRY ID: {id(self._registry)}')
current_app.logger.info(f'REGISTRY SIZE: {len(self._registry.keys())}')
current_app.logger.info(f'APP ID: {id(current_app)}')
return self._registry[URI]
In my create_app() I assign a registry to the app:
app.DBregistry = DBSessionRegistry()
and whenever I need to talk to the DB I call:
current_app.DBregistry.get(URI)
where the URI is dependent on the user. This works nicely if I use uwsgi with one single process. With more processes,
[uwsgi]
processes = 4
threads = 1
sometimes it gets stuck on some requests, returning a 503 error code. I have found that the problem appears when the requests are handled by different processes in uwsgi. This is an excerpt of the log, which I commented to illustrate the issue:
# ... EVERYTHING OK UP TO HERE.
# ALL PREVIOUS REQUESTS HANDLED BY PROCESS pid = 12
INFO in utils: SESSION ID: 139860361716304
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
INFO in utils: APP ID: 139860526857584
# NOTE THE pid IN THE NEXT LINE...
[pid: 12|app: 0|req: 1/1] POST /manager/_save_task =>
generated 154 bytes in 3457 msecs (HTTP/1.1 200) 4 headers in 601
bytes (1 switches on core 0)
# PREVIOUS REQUEST WAS MANAGED BY PROCESS pid = 12
# THE NEXT REQUEST IS FROM THE SAME USER AND TO THE SAME URL.
# SO THERE IS NO NEED FOR CREATING A NEW CONNECTION, BUT INSTEAD...
INFO - CREATING A NEW CONNECTION
# TO THIS POINT, I DON'T UNDERSTAND WHY IT CREATED A NEW CONNECTION.
# THE SESSION ID CHANGES, AS IT IS A NEW SESSION
INFO in utils: SESSION ID: 139860363793168 # <<--- CHANGED
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
# THE APP AND THE REGISTRY ARE UNIQUE
INFO in utils: APP ID: 139860526857584
# uwsgi GIVES UP...
*** HARAKIRI ON WORKER 4 (pid: 11, try: 1) ***
# THE FAILED REQUEST WAS MANAGED BY PROCESS pid = 11
# I ASSUME THIS IS WHY IT CREATED A NEW CONNECTION
HARAKIRI: -- syscall> 7 0x7fff4290c6d8 0x1 0xffffffff 0x4000 0x0 0x0
0x7fff4290c6b8 0x7f33d6e3cbc4
HARAKIRI: -- wchan> poll_schedule_timeout
HARAKIRI !!! worker 4 status !!!
HARAKIRI [core 0] - POST /manager/_save_task since 1587660997
HARAKIRI !!! end of worker 4 status !!!
heroku[router]: at=error code=H13 desc="Connection closed without
response" method=POST path="/manager/_save_task"
DAMN ! worker 4 (pid: 11) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 4 (new pid: 14)
# FROM HERE ON, NOTHINGS WORKS ANYMORE
This behavior is consistent over several attempts: when the pid changes, the request fails. Even with a pool_size = 1 in the create_engine function the issue persists. No issue instead is uwsgi is used with one process.
I am pretty sure it is my fault, there is something I don't know or I don't understand about how uwsgi and/or sqlalchemy work. Could you please help me?
Thanks
What is hapeening is that you are trying to share memory between processes.
There are some exaplanations in these posts.
(is it possible to share memory between uwsgi processes running flask app?).
(https://stackoverflow.com/a/45383617/11542053)
You can use an extra layer to store your sessions outsite of the app.
For that, you can use uWsgi's SharedArea(https://uwsgi-docs.readthedocs.io/en/latest/SharedArea.html) which is very low level or you can user other approaches like uWsgi's caching(https://uwsgi-docs.readthedocs.io/en/latest/Caching.html)
hope it helps.

Celery task blocked in Django view with a AWS SQS broker

I am trying to run a celery task in a Django view using my_task.delay(). However, the task is never executed and the code is blocked on that line and the view never renders. I am using AWS SQS as a broker with an IAM user with full access to SQS.
What am I doing wrong?
Running celery and Django
I am running celery like this:
celery -A app worker -l info
And I am starting my Django server locally in another terminal using:
python manage.py runserver
The celery command outputs:
-------------- celery#LAPTOP-02019EM6 v4.1.0 (latentcall)
---- **** -----
--- * *** * -- Windows-10-10.0.16299 2018-02-07 13:48:18
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: app:0x6372c18
- ** ---------- .> transport: sqs://**redacted**:**#localhost//
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF
--- ***** -----
-------------- [queues]
.> my-queue exchange=my-queue(direct) key=my-queue
[tasks]
. app.celery.debug_task
. counter.tasks.my_task
[2018-02-07 13:48:19,262: INFO/MainProcess] Starting new HTTPS connection (1): sa-east-1.queue.amazonaws.com
[2018-02-07 13:48:19,868: INFO/SpawnPoolWorker-1] child process 20196 calling self.run()
[2018-02-07 13:48:19,918: INFO/SpawnPoolWorker-4] child process 19984 calling self.run()
[2018-02-07 13:48:19,947: INFO/SpawnPoolWorker-3] child process 16024 calling self.run()
[2018-02-07 13:48:20,004: INFO/SpawnPoolWorker-2] child process 19572 calling self.run()
[2018-02-07 13:48:20,815: INFO/MainProcess] Connected to sqs://**redacted**:**#localhost//
[2018-02-07 13:48:20,930: INFO/MainProcess] Starting new HTTPS connection (1): sa-east-1.queue.amazonaws.com
[2018-02-07 13:48:21,307: WARNING/MainProcess] c:\users\nicolas\anaconda3\envs\djangocelery\lib\site-packages\celery\fixups\django.py:202: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2018-02-07 13:48:21,311: INFO/MainProcess] celery#LAPTOP-02019EM6 ready.
views.py
from .tasks import my_task
def index(request):
print('New request') # This is called
my_task.delay()
# Never reaches here
return HttpResponse('test')
tasks.py
...
#shared_task
def my_task():
print('Task ran successfully') # never prints anything
settings.py
My configuration is the following:
import djcelery
djcelery.setup_loader()
CELERY_BROKER_URL = 'sqs://'
CELERY_BROKER_TRANSPORT_OPTIONS = {
'region': 'sa-east-1',
}
CELERY_BROKER_USER = '****************'
CELERY_BROKER_PASSWORD = '***************************'
CELERY_TASK_DEFAULT_QUEUE = 'my-queue'
Versions:
I use the following version of Django and Celery:
Django==2.0.2
django-celery==3.2.2
celery==4.1.0
Thanks for your help!
A bit late, but maybe you are still interested. I got Celery with Django and SQS running and don't see any errors in your code. Maybe you missed something in the celery.py file? Here is my code for comparing.
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'djangoappname.settings')
# do not use namespace because default amqp broker would be called
app = Celery('lsaweb')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks()
Have you also checked if SQS is getting messages (try polling in the SQS administration area)?

Rabbitmq message queues pile up untill system crash (all queues are "ready")

I have a simple Raspberry pi + Django + Celery + Rabbitmq setup that I use to send and receive data from Xbee radios while users interact with the web app.
For the life of me I cant get Rabbitmq (or celery?) under control where after only a single day (sometimes a little longer) the whole system crashes due to some kind of memory leak.
What I am suspecting is that the queues are piling up and never being removed.
Heres a picture of what I see after only a few minutes of run time:
Seems that all of the queues are in the "ready" state.
What's strange is that it would appear that the workers do in fact receive the message and run the task.
The task is very small and shouldn't take longer than 1 second.
I have verified the tasks do execute to the last line and should be returning ok.
I'm no expert and have no clue what I'm actually looking at so I'm unsure if that is normal behavior and my issue lies elsewhere?
I have everything set to run as daemonized, however even when running in development modes I get same results.
I have spent the last four hours debugging with Google search and found it was taking me in circles and I was not finding clarity.
[CONFIGS AND CODE]
in /ect/default/celeryd I have set the following:
CELERY_APP="MyApp"
CELERYD_NODES="w1"
# Python interpreter from environment.
ENV_PYTHON="/home/pi/.virtualenvs/myappenv/bin/python"
# Where to chdir at start.
CELERYD_CHDIR="/home/pi/django_projects/MyApp"
# Virtual Environment Setup
ENV_MY="/home/pi/.virtualenvs/myappenv"
CELERYD="$ENV_MY/bin/celeryd"
CELERYD_MULTI="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryd_multi"
CELERYCTL="$ENV_MY/bin/celeryctl"
CELERYD_OPTS="--app=MyApp --concurrency=1 --loglevel=FATAL"
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_USER="celery"
CELERYD_GROUP="celery"
tasks.py
#celery.task
def sendStatus(modelContext, ignore_result=True, *args, **kwargs):
node = modelContext#EndNodes.objects.get(node_addr_lg=node_addr_lg)
#check age of message and proceed to send status update if it is fresh, otherwise we'll skip it
if not current_task.request.eta == None:
now_date = datetime.now().replace(tzinfo=None) #the time now
eta_date = dateutil.parser.parse(current_task.request.eta).replace(tzinfo=None)#the time this was supposed to run, remove timezone from message eta datetime
delta_seconds = (now_date - eta_date).total_seconds()#seconds from when this task was supposed to run
if delta_seconds >= node.status_timeout:#if the message was queued more than delta_seconds ago this message is too old to process
return
#now that we know the message is fresh we can proceed to process the contents and send status to xbee
hostname = current_task.request.hostname #the name/key in the schedule that might have related xbee sessions
app = Celery('app')#create a new instance of app (because documented methods didnt work)
i = app.control.inspect()
scheduled_tasks = i.scheduled()#the schedule of tasks in the queue
for task in scheduled_tasks[hostname]:#iterate through each task
xbee_session = ast.literal_eval(task['request']['kwargs'])#the request data in the message (converts unicode to dict)
if xbee_session['xbee_addr'] == node.node_addr_lg:#get any session data for this device that we may have set from model's save override
if xbee_session['type'] == 'STAT':#because we are responding with status update we look for status sessions
app.control.revoke(task['request']['id'], terminate=True)#revoke this task because it is redundant and we are sending update now
page_mode = chr(node.page_mode)#the paging mode to set on the remote device
xbee_global.tx(dest_addr_long=bytearray.fromhex(node.node_addr_lg),
frame_id='A',
dest_addr='\xFF\xFE',
data=page_mode)
celery splash:
-------------- celery#raspberrypi v3.1.23 (Cipater)
---- **** -----
--- * *** * -- Linux-4.4.11-v7+-armv7l-with-debian-8.0
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: MyApp:0x762efe10
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: amqp://
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. MyApp.celery.debug_task
. clone_app.tasks.nodeInterval
. clone_app.tasks.nodePoll
. clone_app.tasks.nodeState
. clone_app.tasks.resetNetwork
. clone_app.tasks.sendData
. clone_app.tasks.sendStatus
[2016-10-11 03:41:12,863: WARNING/Worker-1] Got signal worker_process_init for task id None
[2016-10-11 03:41:12,913: WARNING/Worker-1] JUST OPENED
[2016-10-11 03:41:12,915: WARNING/Worker-1] /dev/ttyUSB0
[2016-10-11 03:41:12,948: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2016-10-11 03:41:13,101: INFO/MainProcess] mingle: searching for neighbors
[2016-10-11 03:41:14,206: INFO/MainProcess] mingle: all alone
[2016-10-11 03:41:14,341: WARNING/MainProcess] celery#raspberrypi ready.
[2016-10-11 03:41:16,223: WARNING/Worker-1] RAW DATA
[2016-10-11 03:41:16,225: WARNING/Worker-1] {'source_addr_long': '\x00\x13\xa2\x00#\x89\xe9\xd7', 'rf_data': '...^%:STAT:`', 'source_addr': '[*', 'id': 'rx', 'options': '\x01'}
[2016-10-11 03:41:16,458: INFO/MainProcess] Received task: clone_app.tasks.sendStatus[6e1a74ec-dca5-495f-a4fa-906a5c657b26] eta:[2016-10-11 03:41:17.307421+00:00]
I can provide additional details if required!!
And thank you for any help resolving this.
Wow, almost immedietly after posting my question I found this post and it has completely resolved my issue.
As I expected ignore_result=True was required, I just was not sure where it belonged.
Now I see no queues except maybe for the instant a worker is running a task. :)
Here's the change in tasks.py:
#From
#celery.task
def sendStatus(modelContext, ignore_result=True, *args, **kwargs):
#Some code here
#To
#celery.task(ignore_result=True)
def sendStatus(modelContext, *args, **kwargs):
#Some code here

Django/Celery Quickstart example not working (worker is not executing any tasks)

I'm using Django/Celery Quickstart... or, how I learned to stop using cron and love celery, and it seems the jobs are getting queued, but never run.
tasks.py:
from celery.task.schedules import crontab
from celery.decorators import periodic_task
# this will run every minute, see http://celeryproject.org/docs/reference/celery.task.schedules.html#celery.task.schedules.crontab
#periodic_task(run_every=crontab(hour="*", minute="*", day_of_week="*"))
def test():
print "firing test task"
So I run celery:
bash-3.2$ sudo manage.py celeryd -v 2 -B -s celery -E -l INFO
/scratch/software/python/lib/celery/apps/worker.py:166: RuntimeWarning: Running celeryd with superuser privileges is discouraged!
'Running celeryd with superuser privileges is discouraged!'))
-------------- celery#myserver v3.0.12 (Chiastic Slide)
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: django://localhost//
- ** ---------- . app: default:0x12120290 (djcelery.loaders.DjangoLoader)
- ** ---------- . concurrency: 2 (processes)
- ** ---------- . events: ON
- ** ----------
- *** --- * --- [Queues]
-- ******* ---- . celery: exchange:celery(direct) binding:celery
--- ***** -----
[Tasks]
. GotPatch.tasks.test
[2012-12-12 11:58:37,118: INFO/Beat] Celerybeat: Starting...
[2012-12-12 11:58:37,163: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 11:58:37,249: WARNING/MainProcess] /scratch/software/python/lib/djcelery/loaders.py:132: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-12-12 11:58:37,348: WARNING/MainProcess] celery#myserver ready.
[2012-12-12 11:58:37,352: INFO/MainProcess] consumer: Connected to django://localhost//.
[2012-12-12 11:58:37,700: INFO/MainProcess] child process calling self.run()
[2012-12-12 11:58:37,857: INFO/MainProcess] child process calling self.run()
[2012-12-12 11:59:00,229: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:00:00,017: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:01:00,020: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
[2012-12-12 12:02:00,024: INFO/Beat] Scheduler: Sending due task GotPatch.tasks.test (GotPatch.tasks.test)
The tasks are indeed getting queued:
python manage.py shell
>>> from kombu.transport.django.models import Message
>>> Message.objects.count()
234
And the count increases over time:
>>> Message.objects.count()
477
There are no lines in the log file that seem to indicate the task is being executed. I'm expecting something like:
[... INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None
Any suggestions how to diagnose / debug this?
I'm new to celery as well, but from the comments on the link you provided, it looks like there was an error in the tutorial. One of the comments points out:
At this command
sudo ./manage.py celeryd -v 2 -B -s celery -E -l INFO
You must add "-I tasks" to load tasks.py file ...
Did you try that?
You should check that you specify BROKER_URL parameter inside django's settyngs.py.
BROKER_URL = 'django://'
And you should check that your timezones in django, mysql and celery is equal.
It helped me.
P.s.:
[... INFO/MainProcess] Task myapp.tasks.test[39d57f82-fdd2-406a-ad5f-50b0e30a6492] succeeded in 0.00423407554626s: None
This line means that your task was scheduled (!not executed!)
Please check your config and i hope that it helps you.
I hope someone could learn from my experience in hacking this.
After setting everything up according to the tutorial I noticed that when I call
add.delay(4,5)
nothing happens. the worker did not receive the task (nothing was printed on stderr).
The problem was with the rabbitmq installation. It turns out the default free disk size requirements is 1GB which was way too much for my VM.
what put me on track was to read the rabbitmq log file.
to find it I had to stop and start the rabbitmq server
sudo rabbitmqctl stop
sudo rabbitmq-server
rabbitmq dumps the log file location to the screen. in the file I noticed this:
=WARNING REPORT==== 14-Mar-2017::13:57:41 ===
disk resource limit alarm set on node rabbit#supporttip.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
I then followed the instruction here in order to reduce the free disk limit
Rabbitmq ignores configuration on Ubuntu 12
As a baseline I used the config file from git
https://github.com/rabbitmq/rabbitmq-server/blob/stable/docs/rabbitmq.config.example
The change itself:
{disk_free_limit, "50MB"}

Django celery: Consumer Connection Error (111) when running python manage.py celeryd

I am trying to configure a Django project to use Celery (I am using Django 1.3 on Debian Squeeze)
I installed django-celery (2.3.3) and then followed these instructions.
My django celery settings are the following:
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
BROKER_VHOST = "/"
When I try to launch the celery worker server with...
$ python manage.py celeryd -l info
I get the following output with a "Consumer: Connection Error: [Errno 111]" at the end :
/home/thomas/virtualenv/ULYSSE/lib/python2.6/site-packages/djcelery/loaders.py:84: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2011-09-20 12:14:00,645: WARNING/MainProcess]
-------------- celery#debian v2.3.3
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://guest#localhost:5672//
- ** ---------- . loader: djcelery.loaders.DjangoLoader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 1
- ** ---------- . events: OFF
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
[Tasks]
. competitions.tasks.add
[2011-09-20 12:14:00,788: INFO/PoolWorker-1] child process calling self.run()
[2011-09-20 12:14:00,795: WARNING/MainProcess] celery#debian has started.
[2011-09-20 12:14:00,809: ERROR/MainProcess] **Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 2 seconds**...
Apparently, my settings are correctly read (cf. Configuration section in the output) and the worker process is correctly started ("celery#debian has started")
I can not figure out why this "Consumer: Connection Error: [Errno 111]" error appends...
Has this to do with the BROKER_USER and BROKER_PASSWORD settings?
I tried different settings for user/password (my account, root account...) but I always get the same error. Does 'BROKER_USER' and 'BROKER_PASSWORD refer to a OS user, a database user, a "broker" user?
How can I get rid of this Connection Error?
Looks like rabbitmq isn't installed or running. Can you check this?
apt-get install rabbitmq-server
on Ubuntu