Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used

Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used - flask

I have a Flask app running on Heroku with uwsgi server in which each user connects to his own database. I have implemented the solution reported here for a very similar situation. In particular, I have implemented the connection registry as follows:
class DBSessionRegistry():
_registry = {}
def get(self, URI, **kwargs):
if URI not in self._registry:
current_app.logger.info(f'INFO - CREATING A NEW CONNECTION')
try:
engine = create_engine(URI,
echo=False,
pool_size=5,
max_overflow=5)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
a_session = Session()
self._registry[URI] = a_session
except ArgumentError:
raise Exception('Error')
current_app.logger.info(f'SESSION ID: {id(self._registry[URI])}')
current_app.logger.info(f'REGISTRY ID: {id(self._registry)}')
current_app.logger.info(f'REGISTRY SIZE: {len(self._registry.keys())}')
current_app.logger.info(f'APP ID: {id(current_app)}')
return self._registry[URI]
In my create_app() I assign a registry to the app:
app.DBregistry = DBSessionRegistry()
and whenever I need to talk to the DB I call:
current_app.DBregistry.get(URI)
where the URI is dependent on the user. This works nicely if I use uwsgi with one single process. With more processes,
[uwsgi]
processes = 4
threads = 1
sometimes it gets stuck on some requests, returning a 503 error code. I have found that the problem appears when the requests are handled by different processes in uwsgi. This is an excerpt of the log, which I commented to illustrate the issue:
# ... EVERYTHING OK UP TO HERE.
# ALL PREVIOUS REQUESTS HANDLED BY PROCESS pid = 12
INFO in utils: SESSION ID: 139860361716304
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
INFO in utils: APP ID: 139860526857584
# NOTE THE pid IN THE NEXT LINE...
[pid: 12|app: 0|req: 1/1] POST /manager/_save_task =>
generated 154 bytes in 3457 msecs (HTTP/1.1 200) 4 headers in 601
bytes (1 switches on core 0)
# PREVIOUS REQUEST WAS MANAGED BY PROCESS pid = 12
# THE NEXT REQUEST IS FROM THE SAME USER AND TO THE SAME URL.
# SO THERE IS NO NEED FOR CREATING A NEW CONNECTION, BUT INSTEAD...
INFO - CREATING A NEW CONNECTION
# TO THIS POINT, I DON'T UNDERSTAND WHY IT CREATED A NEW CONNECTION.
# THE SESSION ID CHANGES, AS IT IS A NEW SESSION
INFO in utils: SESSION ID: 139860363793168 # <<--- CHANGED
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
# THE APP AND THE REGISTRY ARE UNIQUE
INFO in utils: APP ID: 139860526857584
# uwsgi GIVES UP...
*** HARAKIRI ON WORKER 4 (pid: 11, try: 1) ***
# THE FAILED REQUEST WAS MANAGED BY PROCESS pid = 11
# I ASSUME THIS IS WHY IT CREATED A NEW CONNECTION
HARAKIRI: -- syscall> 7 0x7fff4290c6d8 0x1 0xffffffff 0x4000 0x0 0x0
0x7fff4290c6b8 0x7f33d6e3cbc4
HARAKIRI: -- wchan> poll_schedule_timeout
HARAKIRI !!! worker 4 status !!!
HARAKIRI [core 0] - POST /manager/_save_task since 1587660997
HARAKIRI !!! end of worker 4 status !!!
heroku[router]: at=error code=H13 desc="Connection closed without
response" method=POST path="/manager/_save_task"
DAMN ! worker 4 (pid: 11) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 4 (new pid: 14)
# FROM HERE ON, NOTHINGS WORKS ANYMORE
This behavior is consistent over several attempts: when the pid changes, the request fails. Even with a pool_size = 1 in the create_engine function the issue persists. No issue instead is uwsgi is used with one process.
I am pretty sure it is my fault, there is something I don't know or I don't understand about how uwsgi and/or sqlalchemy work. Could you please help me?
Thanks

What is hapeening is that you are trying to share memory between processes.
There are some exaplanations in these posts.
(is it possible to share memory between uwsgi processes running flask app?).
(https://stackoverflow.com/a/45383617/11542053)
You can use an extra layer to store your sessions outsite of the app.
For that, you can use uWsgi's SharedArea(https://uwsgi-docs.readthedocs.io/en/latest/SharedArea.html) which is very low level or you can user other approaches like uWsgi's caching(https://uwsgi-docs.readthedocs.io/en/latest/Caching.html)
hope it helps.

Related

Call method once when Flask app started despite many Gunicorn workers

I have a simple Flask app that starts with Gunicorn which has 4 workers.
I want to clear and warmup cache when server restarted. But when I do this inside create_app() method it is executing 4 times.
def create_app(test_config=None):
app = Flask(__name__)
# ... different configuration here
t = threading.Thread(target=reset_cache, args=(app,))
t.start()
return app
[2022-10-28 09:33:33 +0000] [7] [INFO] Booting worker with pid: 7
[2022-10-28 09:33:33 +0000] [8] [INFO] Booting worker with pid: 8
[2022-10-28 09:33:33 +0000] [9] [INFO] Booting worker with pid: 9
[2022-10-28 09:33:33 +0000] [10] [INFO] Booting worker with pid: 10
2022-10-28 09:33:36,908 INFO webapp reset_cache:38 Clearing cache
2022-10-28 09:33:36,908 INFO webapp reset_cache:38 Clearing cache
2022-10-28 09:33:36,908 INFO webapp reset_cache:38 Clearing cache
2022-10-28 09:33:36,909 INFO webapp reset_cache:38 Clearing cache
How to make it only one-time without using any queues, rq-workers or celery?
Signals, mutex, some special check of worker id (but it is always dynamic)?
Tried Haven't found any solution so far.

I used Redis locks for that.
Here is an example using flask-caching, which I had in project, but you can replace set client from whatever place you have redis client:
import time
from webapp.models import cache # cache = flask_caching.Cache()
def reset_cache(app):
with app.app_context():
client = app.extensions["cache"][cache]._write_client # redis client
lock = client.lock("warmup-cache-key")
locked = lock.acquire(blocking=False, blocking_timeout=1)
if locked:
app.logger.info("Clearing cache")
cache.clear()
app.logger.info("Warming up cache")
# function call here with `cache.set(...)`
app.logger.info("Completed warmup cache")
# time.sleep(5) # add some delay if procedure is really fast
lock.release()
It can be easily extended with threads, loops or whatever you need to set value to cache.

Celery unexpectedly closes TCP connection

I'm using RabbitMQ 3.8.2 with Erlang 22.2.7 and having a problem while consuming tasks. My configuration is django-celery-rabbitmq. While publishing messages in a queue everything goes ok until the length of the queue reaches 1200 messages. After this point RabbitMQ starts to close AMQP connection with following errors:
...
2022-11-01 09:35:25.327 [info] <0.20608.9> accepting AMQP connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672)
2022-11-01 09:35:25.483 [info] <0.20608.9> connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672): user 'rabbit_admin' authenticated and granted access to vhost '/'
...
2022-11-01 09:36:59.129 [warning] <0.19994.9> closing AMQP connection <0.19994.9> (185.121.83.108:36149 -> 185.121.83.116:5672, vhost: '/', user: 'rabbit_admin'):
client unexpectedly closed TCP connection
...
[error] <0.11162.9> closing AMQP connection <0.11162.9> (185.121.83.108:57631 -> 185.121.83.116:5672):
{writer,send_failed,{error,enotconn}}
...
2022-11-01 09:35:48.256 [error] <0.20201.9> closing AMQP connection <0.20201.9> (185.121.83.108:50058 -> 185.121.83.116:5672):
{inet_error,enotconn}
...
Then the django-celery consumer disappears from queue list, messages become "ready" and celery pods are unable to ack the message after the job is finished with the following error:
ERROR: [2022-11-01 09:20:23] /usr/src/app/project/celery.py:114 handle_message Error while handling Rabbit task: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/amqp/connection.py", line 514, in channel
return self.channels[channel_id]
KeyError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/project/celery.py", line 76, in handle_message
message.ack()
File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.10/site-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.10/site-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.10/site-packages/amqp/transport.py", line 347, in write
self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
I have noticed that the message size also affects this behavior. In the above case there are like 1000-1500 symbols in each message. If I decrease it to 50 symbols, then the threshold at which RabbitMQ starts to close AMQP connection shifts to 4000-5000 messages.
I suspect that the problem is with lack of resources for RabbitMQ, but I don't know how find what exactly is going wrong. If I run htop on the server, I see that 2 available CPU are not at high load at any time (loaded less than 20% each) and RAM is 400mb / 3840mb used. So nothing seems to be wrong. Is there any resource checking command for RabbitMQ? The tasks do not take long time to complete, about 10 seconds each.
Also maybe there are some missing heartbeats from the client (I had the problem earlier, but not now, there are currently no error messages about that).
Also if I run sudo journalctl --system | grep rabbitmq, I get the following output:
......
Мау 24 05:15:49 oms-git.omsystem sshd[809111]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=43.154.63.169 user=rabbitmq
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Failed password for rabbitmq from 43.154.63.169 port 37010 ssh2
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Disconnected from authenticating user rabbitmq 43.154.63.169 port 37010 [preauth]
Мау 24 16:12:32 oms-git.omsystem sudo[842182]: ad : TTY=pts/3 ; PWD=/var/log/rabbitmq ; USER=root ; COMMAND=/usr/bin/tail -f -n 1000 rabbit#XXX-git.log
......
Maybe here is another issue with firewall, but I don't see any error messages about that in /var/log/rabbitmq/rabbit#XXX.log.
My Celery configuration on client is like:
CELERY_TASK_IGNORE_RESULT = True
CELERY_RESULT_BACKEND = 'django-db'
CELERY_CACHE_BACKEND = 'django-cache'
CELERY_SEND_EVENTS = False
CELERY_BROKER_POOL_LIMIT = 30
CELERY_BROKER_HEARTBEAT = 30
CELERY_BROKER_CONNECTION_TIMEOUT = 600
CELERY_PREFETCH_MULTIPLIER = 1
CELERY_SEND_EVENTS = False
CELERY_WORKER_CONCURRENCY = 1
CELERY_TASK_ACKS_LATE = True
Currently I'm running the pod using following command:
celery -A project.celery worker -l info -f /var/log/celery/celery.log -Ofair
Also I have tried to use various arguments to limit prefetch or turn off heartbit but it didn't work:
celery -A project.celery worker -l info -f /var/log/celery/celery.log --without-heartbeat --without-gossip --without-mingle
celery -A project.celery worker -l info -f /var/log/celery/celery.log --prefetch-multiplier=1 --pool=solo --
I expect that there are no limitations on queue length and every celery pod in my kubernetes cluster consumes and acks messages without errors.

Django Login suddenly stopped working - timing out

My Django project used to work perfectly fine for the last 90 days.
There has been no new code deployment during this time.
Running supervisor -> gunicorn to serve the application and to the front nginx.
Unfortunately it just stopped serving the login page (standard framework login).
I wrote a small view that checks if the DB connection is working and it comes up within seconds.
def updown(request):
from django.shortcuts import HttpResponse
from django.db import connections
from django.db.utils import OperationalError
status = True
# Check database connection
if status is True:
db_conn = connections['default']
try:
c = db_conn.cursor()
except OperationalError:
status = False
error = 'No connection to database'
else:
status = True
if status is True:
message = 'OK'
elif status is False:
message = 'NOK' + ' \n' + error
return HttpResponse(message)
This delivers back an OK.
But the second I am trying to reach /admin or anything else requiring the login, it times out.
wget http://127.0.0.1:8000
--2022-07-20 22:54:58-- http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response... 302 Found
Location: /business/dashboard/ [following]
--2022-07-20 22:54:58-- http://127.0.0.1:8000/business/dashboard/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response... 302 Found
Location: /account/login/?next=/business/dashboard/ [following]
--2022-07-20 22:54:58-- http://127.0.0.1:8000/account/login/? next=/business/dashboard/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
--2022-07-20 22:55:30-- (try: 2) http://127.0.0.1:8000/account/login/?next=/business/dashboard/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response...
Supervisor / Gunicorn Log is not helpful at all:
[2022-07-20 23:06:34 +0200] [980] [INFO] Starting gunicorn 20.1.0
[2022-07-20 23:06:34 +0200] [980] [INFO] Listening at: http://127.0.0.1:8000 (980)
[2022-07-20 23:06:34 +0200] [980] [INFO] Using worker: sync
[2022-07-20 23:06:34 +0200] [986] [INFO] Booting worker with pid: 986
[2022-07-20 23:08:01 +0200] [980] [CRITICAL] WORKER TIMEOUT (pid:986)
[2022-07-20 23:08:02 +0200] [980] [WARNING] Worker with pid 986 was terminated due to signal 9
[2022-07-20 23:08:02 +0200] [1249] [INFO] Booting worker with pid: 1249
[2022-07-20 23:12:26 +0200] [980] [CRITICAL] WORKER TIMEOUT (pid:1249)
[2022-07-20 23:12:27 +0200] [980] [WARNING] Worker with pid 1249 was terminated due to signal 9
[2022-07-20 23:12:27 +0200] [1515] [INFO] Booting worker with pid: 1515
Nginx is just giving:
502 Bad Gateway
I don't see anything in the logs, I don't see any error when running the dev server from Django, also Sentry is not showing anything. Totally lost.
I am running Django 4.0.x and all libraries are updated.

The check up script for the database is only checking the connection. Due to misconfiguration of the database replication, the db was connecting and also reading, but when writing it hang.
The login page tries to write a session to the tables, which failed in this case.

Using Ray from Flask - init() fails (with core dump)

I'm trying to use Ray from a Flask web application.
The whole thing runs in Docker container.
Ray Version is 0.8.6, Flask 1.1.2
When I start the web application, Ray tries to init twice, at it seems, and then the processes crashes. I added the memory limitations later on because there where some warning regarding not enough shared memory size (docker compose setting is "shm_size: '4gb'").
If I start Ray in the same container without using Flask it runs well.
import os
import flask
import ray
from flask import Flask
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
app.config.from_mapping(
SECRET_KEY='dev',
DEBUG = True
)
# ensure the instance folder exists
try:
os.makedirs(app.instance_path)
except OSError:
pass
if ray.is_initialized() == False:
ray.init(ignore_reinit_error=True,
include_webui=False,
object_store_memory=1*1024*1014*1024,
redis_max_memory=2*1024*1014*1024)
ray.worker.global_worker.run_function_on_all_workers(setup_ray_logger)
#app.route('/api/GetAccountRatings', methods=['GET'])
def GetAccountRatings():
return ...
return app
When I start the flask web app with:
export FLASK_APP="mifad.api:create_app()"
export FLASK_ENV=development
flask run --host=0.0.0.0 --port=8084
I get the following error messages:
* Serving Flask app "mifad.api:create_app()" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://0.0.0.0:8084/ (Press CTRL+C to quit)
* Restarting with stat
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
* Debugger is active!
* Debugger PIN: 331-620-174
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
2020-07-06 07:38:10,382 INFO resource_spec.py:212 -- Starting Ray with 59.18 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,610 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:10,675 INFO resource_spec.py:212 -- Starting Ray with 59.13 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,781 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,043 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,479 ERROR import_thread.py:93 -- ImportThread: Error 111 connecting to 172.29.0.2:44946. Connection refused.
2020-07-06 07:38:11,481 ERROR worker.py:949 -- print_logs: Connection closed by server.
2020-07-06 07:38:11,488 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,899 ERROR import_thread.py:93 -- ImportThread: Error while reading from socket: (104, 'Connection reset by peer')
2020-07-06 07:38:11,901 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,908 ERROR worker.py:949 -- print_logs: Connection closed by server.
F0706 07:38:17.390182 4555 4659 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7ff84ae8061d google::LogMessage::Fail()
# 0x7ff84ae81a8c google::LogMessage::SendToLog()
# 0x7ff84ae802f9 google::LogMessage::Flush()
# 0x7ff84ae80511 google::LogMessage::~LogMessage()
# 0x7ff84ae5dde9 ray::RayLog::~RayLog()
# 0x7ff84ac39cea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7ff84ac39f37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7ff84ac6ffb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7ff84ac71da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7ff84ac4251d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7ff84ab96870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7ff84b0b80df boost::asio::detail::scheduler::do_run_one()
# 0x7ff84b0b8cf1 boost::asio::detail::scheduler::run()
# 0x7ff84b0b9c42 boost::asio::io_context::run()
# 0x7ff84ab7db10 ray::CoreWorker::RunIOService()
# 0x7ff84a7763e7 execute_native_thread_routine_compat
# 0x7ff84deed6db start_thread
# 0x7ff84dc1688f clone
F0706 07:38:17.804720 4553 4703 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7fedd65e261d google::LogMessage::Fail()
# 0x7fedd65e3a8c google::LogMessage::SendToLog()
# 0x7fedd65e22f9 google::LogMessage::Flush()
# 0x7fedd65e2511 google::LogMessage::~LogMessage()
# 0x7fedd65bfde9 ray::RayLog::~RayLog()
# 0x7fedd639bcea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7fedd639bf37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7fedd63d1fb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7fedd63d3da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7fedd63a451d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7fedd62f8870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7fedd681a0df boost::asio::detail::scheduler::do_run_one()
# 0x7fedd681acf1 boost::asio::detail::scheduler::run()
# 0x7fedd681bc42 boost::asio::io_context::run()
# 0x7fedd62dfb10 ray::CoreWorker::RunIOService()
# 0x7fedd5ed83e7 execute_native_thread_routine_compat
# 0x7fedd968f6db start_thread
# 0x7fedd93b888f clone
Aborted (core dumped)
What am I doing wrong?
Best regards,
Bernd

Rabbitmq message queues pile up untill system crash (all queues are "ready")

I have a simple Raspberry pi + Django + Celery + Rabbitmq setup that I use to send and receive data from Xbee radios while users interact with the web app.
For the life of me I cant get Rabbitmq (or celery?) under control where after only a single day (sometimes a little longer) the whole system crashes due to some kind of memory leak.
What I am suspecting is that the queues are piling up and never being removed.
Heres a picture of what I see after only a few minutes of run time:
Seems that all of the queues are in the "ready" state.
What's strange is that it would appear that the workers do in fact receive the message and run the task.
The task is very small and shouldn't take longer than 1 second.
I have verified the tasks do execute to the last line and should be returning ok.
I'm no expert and have no clue what I'm actually looking at so I'm unsure if that is normal behavior and my issue lies elsewhere?
I have everything set to run as daemonized, however even when running in development modes I get same results.
I have spent the last four hours debugging with Google search and found it was taking me in circles and I was not finding clarity.
[CONFIGS AND CODE]
in /ect/default/celeryd I have set the following:
CELERY_APP="MyApp"
CELERYD_NODES="w1"
# Python interpreter from environment.
ENV_PYTHON="/home/pi/.virtualenvs/myappenv/bin/python"
# Where to chdir at start.
CELERYD_CHDIR="/home/pi/django_projects/MyApp"
# Virtual Environment Setup
ENV_MY="/home/pi/.virtualenvs/myappenv"
CELERYD="$ENV_MY/bin/celeryd"
CELERYD_MULTI="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryd_multi"
CELERYCTL="$ENV_MY/bin/celeryctl"
CELERYD_OPTS="--app=MyApp --concurrency=1 --loglevel=FATAL"
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_USER="celery"
CELERYD_GROUP="celery"
tasks.py
#celery.task
def sendStatus(modelContext, ignore_result=True, *args, **kwargs):
node = modelContext#EndNodes.objects.get(node_addr_lg=node_addr_lg)
#check age of message and proceed to send status update if it is fresh, otherwise we'll skip it
if not current_task.request.eta == None:
now_date = datetime.now().replace(tzinfo=None) #the time now
eta_date = dateutil.parser.parse(current_task.request.eta).replace(tzinfo=None)#the time this was supposed to run, remove timezone from message eta datetime
delta_seconds = (now_date - eta_date).total_seconds()#seconds from when this task was supposed to run
if delta_seconds >= node.status_timeout:#if the message was queued more than delta_seconds ago this message is too old to process
return
#now that we know the message is fresh we can proceed to process the contents and send status to xbee
hostname = current_task.request.hostname #the name/key in the schedule that might have related xbee sessions
app = Celery('app')#create a new instance of app (because documented methods didnt work)
i = app.control.inspect()
scheduled_tasks = i.scheduled()#the schedule of tasks in the queue
for task in scheduled_tasks[hostname]:#iterate through each task
xbee_session = ast.literal_eval(task['request']['kwargs'])#the request data in the message (converts unicode to dict)
if xbee_session['xbee_addr'] == node.node_addr_lg:#get any session data for this device that we may have set from model's save override
if xbee_session['type'] == 'STAT':#because we are responding with status update we look for status sessions
app.control.revoke(task['request']['id'], terminate=True)#revoke this task because it is redundant and we are sending update now
page_mode = chr(node.page_mode)#the paging mode to set on the remote device
xbee_global.tx(dest_addr_long=bytearray.fromhex(node.node_addr_lg),
frame_id='A',
dest_addr='\xFF\xFE',
data=page_mode)
celery splash:
-------------- celery#raspberrypi v3.1.23 (Cipater)
---- **** -----
--- * *** * -- Linux-4.4.11-v7+-armv7l-with-debian-8.0
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: MyApp:0x762efe10
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: amqp://
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. MyApp.celery.debug_task
. clone_app.tasks.nodeInterval
. clone_app.tasks.nodePoll
. clone_app.tasks.nodeState
. clone_app.tasks.resetNetwork
. clone_app.tasks.sendData
. clone_app.tasks.sendStatus
[2016-10-11 03:41:12,863: WARNING/Worker-1] Got signal worker_process_init for task id None
[2016-10-11 03:41:12,913: WARNING/Worker-1] JUST OPENED
[2016-10-11 03:41:12,915: WARNING/Worker-1] /dev/ttyUSB0
[2016-10-11 03:41:12,948: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2016-10-11 03:41:13,101: INFO/MainProcess] mingle: searching for neighbors
[2016-10-11 03:41:14,206: INFO/MainProcess] mingle: all alone
[2016-10-11 03:41:14,341: WARNING/MainProcess] celery#raspberrypi ready.
[2016-10-11 03:41:16,223: WARNING/Worker-1] RAW DATA
[2016-10-11 03:41:16,225: WARNING/Worker-1] {'source_addr_long': '\x00\x13\xa2\x00#\x89\xe9\xd7', 'rf_data': '...^%:STAT:`', 'source_addr': '[*', 'id': 'rx', 'options': '\x01'}
[2016-10-11 03:41:16,458: INFO/MainProcess] Received task: clone_app.tasks.sendStatus[6e1a74ec-dca5-495f-a4fa-906a5c657b26] eta:[2016-10-11 03:41:17.307421+00:00]
I can provide additional details if required!!
And thank you for any help resolving this.

Wow, almost immedietly after posting my question I found this post and it has completely resolved my issue.
As I expected ignore_result=True was required, I just was not sure where it belonged.
Now I see no queues except maybe for the instant a worker is running a task. :)
Here's the change in tasks.py:
#From
#celery.task
def sendStatus(modelContext, ignore_result=True, *args, **kwargs):
#Some code here
#To
#celery.task(ignore_result=True)
def sendStatus(modelContext, *args, **kwargs):
#Some code here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used - flask

Related

Call method once when Flask app started despite many Gunicorn workers

Celery unexpectedly closes TCP connection

Django Login suddenly stopped working - timing out

Using Ray from Flask - init() fails (with core dump)

Rabbitmq message queues pile up untill system crash (all queues are "ready")

Categories

Resources