Amazon SQS and celery events (not JSON serializable) - django

I was looking today into Amazon SQS as an alternative approach to installing my own RabbitMQ on EC2 instance.
I have followed the documentation as described here
Within a paragraph it says:
SQS does not yet support events, and so cannot be used with celery
events, celerymon or the Django Admin monitor.
I am a bit confused what events means here. e.g. in the scenario below I have a periodic task every minute where I call the sendEmail.delay(event) asynchronously.
e.g.
#celery.task(name='tasks.check_for_events')
#periodic_task(run_every=datetime.timedelta(minutes=1))
def check_for_events():
now = datetime.datetime.utcnow().replace(tzinfo=utc,second=00, microsecond=00)
events = Event.objects.filter(reminder_date_time__range=(now - datetime.timedelta(minutes=5), now))
for event in events:
sendEmail.delay(event)
#celery.task(name='tasks.sendEmail')
def sendEmail(event):
event.sendMail()
When running it with Amazon SQS I get this error message:
tasks.check_for_events[7623fb2e-725d-4bb1-b09e-4eee24280dc6] raised
exception: TypeError(' is
not JSON serializable',)
So is that the limitation of SQS as pointed out in the documentation or am I doing something fundamentally wrong?
Many thanks for advice,

I might have found the solution. Simply refactor the sendMail() function inside event into the main task therefore there won't be any need to parse the object into json:
#celery.task(name='tasks.check_for_events')
#periodic_task(run_every=datetime.timedelta(minutes=1))
def check_for_events():
now = datetime.datetime.utcnow().replace(tzinfo=utc,second=00, microsecond=00)
events = list(Event.objects.filter(reminder_date_time__range=(now - datetime.timedelta(minutes=5), now)))
for event in events:
subject = 'Event Reminder'
link = None
message = ...
sendEmail.delay(subject, message, event.user.email)
#celery.task(name='tasks.sendEmail')
def sendEmail(subject, message, email):
send_mail(subject, message, settings.DEFAULT_FROM_EMAIL, [email])
This works both with Rabbitmq and Amazon SQS

For someone returning to this post,
This happens when the serializer defined in your celery runtime config is not able to process objects passed to the celery task.
For example: if the config says JSON as required format and some Model object is supplied, above mentioned exception might be raised.
(Q): Is it explicitly necessary to define these parameters
# CELERY_ACCEPT_CONTENT=['json', ],
# CELERY_TASK_SERIALIZER='json',
# CELERY_RESULT_SERIALIZER='json',

Related

AWS SES python sqs.receive_message returning only one message output

I am using amazon ses python sdk to see how many messages are there in the queue for a given queue URL. in amazon GUI console i can see there are 3 messages within the queue for the queue URL. However i do not get more than 1 message as output everytime i run the command. Below is my code
import boto3
import json
from botocore.exceptions import ClientError
def GetSecretKeyAndAccesskey():
#code to pull secretkey and access key
return(aws_access_key,aws_secret_key)
# Create SQS client
aws_access_key_id,aws_secret_access_key = GetSecretKeyAndAccesskey()
sqs = boto3.client('sqs',aws_access_key_id=str(aws_access_key_id),aws_secret_access_key=str(aws_secret_access_key) ,region_name='eu-west-1')
response = sqs.receive_message(
QueueUrl='my_queue_url',
AttributeNames=[
'All',
],
MaxNumberOfMessages=10,
)
print(response["Messages"][0])
Every time i run the code i get a different message id, and if i change my print code to check for the next list i get list index out of bound meaning that there is only one message
print(response["Messages"][1])
C:\>python testing.py
d4e57e1d-db62-4fc5-8233-c5576cb2603d
C:\>python testing.py
857858e9-55dc-4d23-aead-3c6622feccc5
First, you need to add "WaitTimeSeconds" to turn on long polling and collect more messages during a single connection.
The other issue is that if you only put 3 messages on the queue, they get separated on the backend systems as part of the redundancy of the AWS SQS service. So when you call to SQS, it connects you to one of the systems and delivers the single message that's available. If you increase the number of total messages, you'll get more messages per request.
I wrote this code to demonstrate the functionality of SQS and allow you to play around with the concept and test.
import json
session = boto3.Session(region_name="us-east-2", profile_name="dev")
sqs = session.client('sqs')
def get_message():
response = sqs.receive_message(QueueUrl='test-queue', MaxNumberOfMessages=10, WaitTimeSeconds=10)
return len(response["Messages"])
def put_messages(seed):
for message_number in range(seed):
body = {"test": "message {}".format(message_number)}
sqs.send_message(QueueUrl='test-queue', MessageBody=json.dumps(body))
if __name__ == '__main__':
put_messages(2)
print(get_message())

APscheduler job not firing when run in Flask on AWS Lambda

A while back, I wrote a small Flask app (deployed as an AWS lambda via Serverless) to do some on-the-fly DynamoDB updates via Slack slash commands. A coworker suggested adding a component so that updates could be scheduled in advance.
I looked up using APscheduler and added a new component to the app. In the abbreviated example following, a Slack slash command would send a POST request to the app's "/scheduler" endpoint:
from flask import Flask, request
from apscheduler.schedulers.background import BackgroundScheduler
from pytz import timezone
[etc...]
app = Flask(__name__)
city = timezone([my timezone])
sched = BackgroundScheduler(timezone=city)
sched.start()
def success_webhook(markdown):
webhook_url = os.environ["webhook_url"]
data = json.dumps({"text": {"type": "mrkdwn", "text": markdown}})
headers = {"Content-Type": "application/json"}
r.post(webhook_url, data=data, headers=headers)
def pass_through(package):
db = boto3.resource(
"dynamodb",
region_name=os.environ["region_name"],
aws_access_key_id=os.environ["aws_access_key_id"],
aws_secret_access_key=os.environ["aws_secret_access_key"],
)
table = db.Table(table_name)
update_action = table.update_item(
Key={"id": "[key]"},
UpdateExpression="SET someValue = :val1",
ExpressionAttributeValues={":val1": package["text"]},
)
if update_action["ResponseMetadata"]["HTTPStatusCode"] == 200:
success_webhook("success")
#app.route("/scheduler", methods=["POST"])
def scheduler():
incoming = (request.values).to_dict()
sched.add_job(pass_through, "date", run_date=incoming["run_date"],
id=incoming["id_0"], args=[incoming])
return "success", 200
if __name__ == "__main__":
app.run()
I tested locally and everything worked fine -- I could schedule jobs and they would run on time; other app endpoints for checking scheduled jobs and removing scheduled jobs [not shown above] also worked as expected.
But once I spun up the AWS lambda running the Flask app, the scheduler never actually runs the pass_through() function for the jobs. Sure, the job gets added -- I can also see it in the list of jobs and remove it from the schedule -- but when the time comes for the lambda to actually run pass_through(), it doesn't. Wondering if anyone knows anything about this situation?
Lambda execution will stop right after you return a value, so even when you schedule the job here:
sched.add_job(pass_through, "date", run_date=incoming["run_date"],
id=incoming["id_0"], args=[incoming])
return "success", 200
The lambda execution will stop and the job will not run later.
If you need to schedule jobs you probably need another solution that is not lambda, however you may use cloudwatch to trigger you lambdas on schedule: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

Perform celery task after successful commit in Flask?

Relatively long-running task is delegated to celery workers, which are running separately, on another server.
However, results are added back to the relational database (table updated according to a task_descr.id as a key, see below), worker uses ignore_result.
Task requested from Flask application:
task = app.celery.send_task('tasks.mytask', [task_descr.id, attachments])
The problem is that tasks are requested while transaction is not yet closed on the Flask side. This causes race condition, because sometimes celery worker completes the task before the end of transaction in Flask app.
What is the proper way to send tasks after successful transaction only?
Or should the worker check task_descr.id availability before attempting conditional UPDATE and retry the task (this feels as too complex arrangement)?
Answer to Run function after a certain type of model is committed discusses similar situation, but here task sending is explicit, so no need to listen to the updates/inserts in some model.
One of the ways is Per-Request After-Request Callbacks, thanks to Armin Ronacher:
from flask import g
def after_this_request(func):
if not hasattr(g, 'call_after_request'):
g.call_after_request = []
g.call_after_request.append(func)
return func
#app.after_request
def per_request_callbacks(response):
for func in getattr(g, 'call_after_request', ()):
response = func(response)
return response
In my case the usage is in the form of a nested function:
task_desc = ...
attachments = ...
#...
#after_this_request
def send_mytask(response):
if response.status_code in {200, 302}:
task = app.celery.send_task('tasks.mytask', [task_descr.id, attachments])
return response
Not ideal, but works. My tasks are only for successfully served request, so I do not care of 500s or other error conditions.

Receiving events from celery task

I have a long running celery task which iterates over an array of items and performs some actions.
The task should somehow report back which item is it currently processing so end-user is aware of the task's progress.
At the moment my django app and celery seat together on one server, so I am able to use Django's models to report the status, but I am planning to add more workers which are away from Django, so they can't reach DB.
Right now I see few solutions:
Store intermediate results manually using some storage, like redis or mongodb making then available over the network. This worries me a little bit because if for example I will use redis then I should keep in sync the code on a Django side reading the status and Celery task writing the status, so they use the same keys.
Report status to the Django back from celery using REST calls. Like PUT http://django.com/api/task/123/items_processed
Maybe use Celery event system and create events like Item processed on which django updates the counter
Create a seperate worker which runs on a server with django which holds a task which only increases items proceeded count, so when the task is done with an item it issues increase_messages_proceeded_count.delay(task_id).
Are there any solution or hidden problems with the ones I mentioned?
There are probably many ways to achieve your goal, but here is how I would do it.
Inside your long running celery task set the progress using django's caching framework:
from django.core.cache import cache
#app.task()
def long_running_task(self, *args, **kwargs):
key = "my_task: %s" % self.result.id
...
# do whatever you need to do and set the progress
# using cache:
cache.set(key, progress, timeout="whatever works for you")
...
Then all you have to do is make a recurring AJAX GET request with that key and retrieve the progress from cache. Something along those lines:
def task_progress_view(request, *args, **kwargs):
key = request.GET.get('task_key')
progress = cache.get(key)
return HttpResponse(content=json.dumps({'progress': progress}),
content_type="application/json; charset=utf-8")
Here is a caveat though, if you are running your server as multiple processes, make sure that you are using something like memcached, because django's native caching will be inconsistent among the processes. Also I probably wouldn't use celery's task_id as a key, but it is sufficient for demonstration purpose.
Take a look at flower - a real-time monitor and web admin for Celery distributed task queue:
https://github.com/mher/flower#api
http://flower.readthedocs.org/en/latest/api.html#get--api-tasks
You need it for presentation, right? Flower works with websockets.
For instance - receive task completion events in real-time (taken from official docs):
var ws = new WebSocket('ws://localhost:5555/api/task/events/task-succeeded/');
ws.onmessage = function (event) {
console.log(event.data);
}
You would likely need to work with tasks ('ws://localhost:5555/api/tasks/').
I hope this helps.
Simplest:
Your tasks and django app already share access one or two data stores - the broker and the results backend (if you're using one that is different to the broker)
You can simply put some data into one or other of these data stores that indicates which item the task is currently processing.
e.g. if using redis simply have a key 'task-currently-processing' and store the data relevant to the item currenlty being processed in there.
You can use something like Swampdragon to reach the user from the Celery instance (you have to be able to reach it from the client thou, take care not to run afoul of CORS thou). It can be latched onto the counter, not the model itself.
lehins' solution looks good if you don't mind your clients repeatedly polling your backend. That may be fine but it gets expensive as the number of clients grows.
Artur Barseghyan's solution is suitable if you only need the task lifecycle events generated by Celery's internal machinery.
Alternatively, you can use Django Channels and WebSockets to push updates to clients in real-time. Setup is pretty straightforward.
Add channels to your INSTALLED_APPS and set up a channel layer. E.g., using a Redis backend:
CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels_redis.core.RedisChannelLayer",
"CONFIG": {
"hosts": [("redis", 6379)]
}
}
}
Create an event consumer. This will receive events from Channels and push them via Websockets to the client. For instance:
import json
from asgiref.sync import async_to_sync
from channels.generic.websocket import WebSocketConsumer
class TaskConsumer(WebsocketConsumer):
def connect(self):
self.task_id = self.scope['url_route']['kwargs']['task_id'] # your task's identifier
async_to_sync(self.channel_layer.group_add)(f"tasks-{self.task_id}", self.channel_name)
self.accept()
def disconnect(self, code):
async_to_sync(self.channel_layer.group_discard)(f"tasks-{self.task_id}", self.channel_name)
def item_processed(self, event):
item = event['item']
self.send(text_data=json.dumps(item))
Push events from your Celery tasks like this:
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
...
async_to_sync(get_channel_layer.group_send)(f"tasks-{task.task_id}", {
'type': 'item_processed',
'item': item,
})
You can also write an async consumer and/or invoke group_send asynchronously. In either case you no longer need the async_to_sync wrapper.
Add websocket_urlpatterns to your urls.py:
websocket_urlpatterns = [
path(r'ws/tasks/<task_id>/', TaskConsumer.as_asgi()),
]
Finally, to consume events from JavaScript in your client, you can do something like this:
let task_id = 123;
let protocol = location.protocol === 'https:' ? 'wss://' : 'ws://';
let socket = new WebSocket(`${protocol}${window.location.host}/ws/tasks/${task_id}/`);
socket.onmessage = function(event) {
let data = JSON.parse(event.data);
let item = data.item;
// do something with the item (e.g., push it into your state container)
}

How to implement proxy/broker for (X)PUB/(X)SUB messaging in ZMQ?

So I was reading this article on how to create proxy/broker for (X)PUB/(X)SUB messaging in ZMQ. There is this nice picture of what shall architecture look like :
But when I look at XSUB socket description I do not get how to forward all subscriptions via it due to the fact that its Outgoing routing strategy is N/A
So how one shall implement (un)subscription forwarding in ZeroMQ, what is minimal user code for such forwarding application (one that can be inserted between simple Publisher and Subscriber samples)?
XPUB does receive messages - the only messages it receives are subscriptions from connected subscribers, and these messages should be forwarded upstream as-is via XSUB.
The very simplest way to relay messages is with zmq_proxy:
xpub = ctx.socket(zmq.XPUB)
xpub.bind(xpub_url)
xsub = ctx.socket(zmq.XSUB)
xsub.bind(xsub_url)
pub = ctx.socket(zmq.PUB)
pub.bind(pub_url)
zmq.proxy(xpub, xsub, pub)
which will relay messages to/from xpub and xsub. Optionally, you can add a PUB socket to monitor the traffic that passes through in either direction.
If you want user code in the middle to implement extra routing logic, you would do something like this,
which re-implements the inner loop of zmq_proxy:
def broker(ctx):
xpub = ctx.socket(zmq.XPUB)
xpub.bind(xpub_url)
xsub = ctx.socket(zmq.XSUB)
xsub.bind(xsub_url)
poller = zmq.Poller()
poller.register(xpub, zmq.POLLIN)
poller.register(xsub, zmq.POLLIN)
while True:
events = dict(poller.poll(1000))
if xpub in events:
message = xpub.recv_multipart()
print "[BROKER] subscription message: %r" % message[0]
xsub.send_multipart(message)
if xsub in events:
message = xsub.recv_multipart()
# print "publishing message: %r" % message
xpub.send_multipart(message)
# insert user code here
full working (Python) example