Celery with Amazon SQS - amazon-web-services

I want to use Amazon SQS as broker backed of Celery. There’s the SQS transport implementation for Kombu, which Celery depends on. However there is not enough documentation for using it, so I cannot find how to configure SQS on Celery. Is there somebody that had succeeded to configure SQS on Celery?

I ran into this question several times but still wasn't entirely sure how to setup Celery to work with SQS. It turns out that it is quite easy with the latest versions of Kombu and Celery. As an alternative to the BROKER_URL syntax mentioned in another answer, you can simply set the transport, options, user, and password like so:
BROKER_TRANSPORT = 'sqs'
BROKER_TRANSPORT_OPTIONS = {
'region': 'us-east-1',
}
BROKER_USER = AWS_ACCESS_KEY_ID
BROKER_PASSWORD = AWS_SECRET_ACCESS_KEY
This gets around a purported issue with the URL parser that doesn't allow forward slashes in your API secret, which seems to be a fairly common occurrence with AWS. Since there didn't seem to be a wealth of information out there about the topic yet, I also wrote a short blog post on the topic here:
http://www.caktusgroup.com/blog/2011/12/19/using-django-and-celery-amazon-sqs/

I'm using Celery 3.0 and was getting deprecation warnings when launching the worker with the BROKER_USER / BROKER_PASSWORD settings.
I took a look at the SQS URL parsing in kombo.utils.url._parse_url and it is calling urllib.unquote on the username and password elements of the URL.
So, to workaround the issue of secret keys with forward slashes, I was able to successfully use the following for the BROKER_URL:
import urllib
BROKER_URL = 'sqs://%s:%s#' % (urllib.quote(AWS_ACCESS_KEY_ID, safe=''),
urllib.quote(AWS_SECRET_ACCESS_KEY, safe=''))
I'm not sure if access keys can ever have forward slashes in them but it doesn't hurt to quote it as well.

For anybody stumbling upon this question, I was able to get Celery working out-of-the-box with SQS (no patching required), but I did need to update to the latest versions of Celery and Kombu for this to work (1.4.5 and 1.5.1 as of now). Use the config lines above and it should work (although you'll probably want to change the default region).
Gotcha: in order to use the URL format above, you need to make sure your AWS secret doesn't contain slashes, as this confuses the URL parser. Just keep generating new secrets until you get one without a slash.

Nobody answered about this. Anyway I tried to configure Celery with Amazon SQS, and it seems I achieved a small success.
Kombu should be patched for this, so I wrote some patches and there is my pull request as well. You can configure Amazon SQS by setting BROKER_URL of sqs:// scheme in Celery on the patched Kombu. For example:
BROKER_URL = 'sqs://AWS_ACCESS:AWS_SECRET#:80//'
BROKER_TRANSPORT_OPTIONS = {
'region': 'ap-northeast-1',
'sdb_persistence': False
}

I regenerated the credentials in the IAM consonle until I got a key without a slash (/). The parsing issues are only with that character, so if your secret doesn't have one you'll be fine.
Not the most terribly elegant solution, but definitely keeps the code clean of hacks.

Update for Python 3, removing backslashes from the AWS KEY.
from urllib.parse import quote_plus
BROKER_URL = 'sqs://{}:{}#'.format(
quote_plus(AWS_ACCESS_KEY_ID),
quote_plus(AWS_SECRET_ACCESS_KEY)
)

I was able to configure SQS on celery 4.3 (python 3.7) by using kombu.
from kombu.utils.url import quote
CELERY_BROKER_URL = 'sqs://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}#'.format(
AWS_ACCESS_KEY_ID=quote(AWS_ACCESS_KEY_ID, safe=''),
AWS_SECRET_ACCESS_KEY=quote(AWS_SECRET_ACCESS_KEY, safe='')
)

Related

Configuring Celery + AWS SQS to revoke tasks

I am running Celery+Kombu 4.4.6 on AWS SQS and want to revoke and terminate tasks.
Reading through documentation and SO posts, the transport needs to allow broadcast messages. SQS does not do broadcast messages and Celery+Kombu needs to use SimpleDB for those. That option was turned off by default long way back in version 1.x. To enables it, support_fanout = True needs to be added to the transport options.
But adding just that option is not working for me and I can't figure out what am I missing. Possible options are:
SimpleDB - it is not clear to me how do I even enable SimpleDB. I do see documentation in AWS, but I do not see it as a separate service.
Any additional config to be added?
Looking briefly at the SQS code, seems like SimpleDB is the only option for this. Is that correct?
Any other option to enable task revocation on SQS?
In my app.celery I have:
app = Celery('app',
broker=''sqs://<Access key>:<secret key>#')),
backend='cache+memcached://<host>:11211/')),
)
And in my app.settings I have:
CELERY_BROKER_URL='sqs://<access key>:<secret key>#'))
CELERY_BROKER_TRANSPORT_OPTIONS = {
'region': '<region>',
'supports_fanout': True,
}
CELERY_DEFAULT_QUEUE = 'app'
CELERY_DEFAULT_EXCHANGE = 'app'
CELERY_DEFAULT_ROUTING_KEY = 'app'
My final solution was to use Amazon MQ with a RabbitMQ instance. Amazon SimpleDB seems to be gone, making any support in Celery+Kombu obsolete and broken.

Django post-office setup

Perhaps it is just because I've never set up an e-mail system on Django before, or maybe I'm missing it... but does anyone have any insight on how to properly configure django post-office for sending queued e-mails?
I've got a mailing list of 1500 + people, and am hosting my app on heroku - using the standard email system doesn't work because I need to send customized emails to each user, and to connect to the server one by one leads to a timeout.
I've installed django-post_office via pip install, installed the app in settings.py, I've even been able to get an email to send by going:
mail.send(['recipient'],'sender',subject='test',message='hi there',priority='now')
However, if I try to schedule for 30 seconds from now let's say:
nowtime = datetime.datetime.now()
sendtime = nowtime + datetime.timedelta(seconds=30)
and then
mail.send(['recipient'],'sender',subject='test',message='hi there',scheduled_time=sendtime)
Nothing happens... time passes, and the e-mail is still listed as queued, and I don't receive any emails.
I have a feeling it's because I need to ALSO have Celery / RQ / Cron set up??? But the documentation seems to suggest that it should work out of the box. What am I missing?
Thanks folks
Actually, you can find this in the documentation (at the time I'm writing this comment):
Usage
If you use post_office’s EmailBackend, it will automatically queue emails sent using django’s send_mail in the database.
To actually send them out, run python manage.py send_queued_mail. You can schedule this to run regularly via cron:
* * * * * (/usr/bin/python manage.py send_queued_mail >> send_mail.log 2>&1)

Triggering Celery Tasks from c++ with Redis

We have a setup where we have a web frontend programmed in Django and a backend written in C++ that parses data for us.
The frontend uses Celery in combination with Redis for asynchronous tasks.
Since it would be convenient in some situations, I was wondering today if it is possible to trigger a Celery task from within C++.
Since there is a Redis client available for C++, I am pretty sure that this is possible, if the correct messages are sent to Redis, however, I was not able to find any information on this anywhere.
My next step would be to try and dig the needed Information out of the Celery source code, but before I do that:
Does anybody have any information on this subject that could help me or get me started or is there even someone who has done this before?
Any help is appreciated. (Also if you got a reason why this will not work.)
Thank you.
I had a similar need to trigger a celery task from logstash. Basically, I had to create a message that looked something like this:
{
"body": "base_64_encoded_string (see below)",
"content-type": "application/json",
"properties": {
"body_encoding": "base64",
"correlation_id":"f009c9e0-0ca6-42a6-a046-3d0e53e06060",
"reply_to":"e1eb91f0-6780-4c34-b633-7ef9a46baf5e",
"delivery_mode":2,
"delivery_tag": "7788b924-a7fe-4c9a-839e-1c7ca602dbba",
"delivery_info": {
"priority":0,
"routing_key":"default",
"exchange":"default"
}
}
}
In this case, the decoded body translates to:
{
"args": ["meta_val","doc_value"],
"task":"goldstone.compliance.tasks.process_fim_event",
"id":"23deb69e-49c1-4a61-8639-d4627d0fc591"
}
If you have kwargs on your task, you can add a kwargs: {"key": "value", ...} to your body.
The body above maps triggers a task called process_fim_event. The task def looks like:
#task()
def process_fim_event(meta, doc):
...
The easiest way of doing this that I know of is to use flower, a HTTP Celery API. With flower you can create a task with anything that can make an HTTP request. One example from the Github Readme:
$ curl -X POST -d '{"args":[1,2]}' http://localhost:5555/api/task/async-apply/tasks.add
So, the idea is that your c++ app would make an HTTP request against the flower api, which would then insert the task into your Redis queue.

Celery with Django and MongoDB (mongoengine)

1) I am trying to build an application using Celery (with RabbitMQ as broker) and Django - using MongoDB (mongoengine) as the database for the model. So the requests received by the web server will be transformed in tasks and queued with the help of Celery to be executed by the workers.
I followed the following tutorials:
http://docs.celeryproject.org/en/master/django/first-steps-with-django.html#configuring-your-django-project-to-use-celery
and
https://mongoengine-odm.readthedocs.org/en/latest/django.html
but I still get the following error:
ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value.
As mentioned in both tutorials, settings.DATABASES should be commented and instead of it there should be only
mongoengine.connect('myDB')
and yet the error is exactly about not having the DATABASES configured.
(Apart from that, I have not configure any results-backend for Celery.)
Can anybody help me with an advice as to what I have to set and where?
2) And another question is: in the projects involving only Celery there is always a Celery instance. But in the tutorials about building web applications with Django and Celery I haven't seen any mention of this. Do I have to explicitly instantiate Celery or is this done somewhere else by default?
1) In case anybody is interested in the answer, I finally managed to get it working, but I am not sure if I understood correctly what happened.
So apparently the problem was that I haven't set the result backend for Celery. I got rid of the error as soon as I put the following line in settings.py:
CELERY_RESULT_BACKEND = "amqp"
2) My project (I am using djcelery) is working without me explicitly instantiating Celery. I assume this is done somewhere in the back by the framework.
What happens is that Celery attempts to use Django's default database (the one defined in settings.DATABASES) as the Celery Result database, but to use mongoengine as your primary Django database you have to bypass settings.DATABASES.
So just make sure you define both BROKER_URL and CELERY_RESULT_BACKEND properly so that Celery doesn't try to consult settings.DATABASES. I guess you want them to be the same, but you could chose to have them separate.
BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_RESULT_BACKEND = "amqp"
For other backends, consult this.
Part 2 of your question.
Do you have CELERY_ALWAYS_EAGER = True in your settings.py? That is how, typically, the Celery process doesn't need to be launched separately. But do not use this in production. See this question.

Why doesn't CeleryCAM work with Amazon SQS?

I'm using Celery 2.4.6 and django-celery 2.4.2.
When I configure Celery to use Amazon SQS per the resolution on this question: Celery with Amazon SQS
I don't see anything in the celerycam table in the Django admin. If I switch back to RabbitMQ, the tasks start showing up again.
I have a lot (now 40+) queues in SQS named something like this: "celeryev-92e068c4-9390-4c97-bc1d-13fd6e309e19", which look like they might be related (some of the older ones even have an event in them), but nothing's showing up in the database and I see no errors in the celerycam log.
Any suggestions on what the issue might be or how to debug this further would be much appreciated.
SQS is a limited implementation of an AMQP bus. As I understand, it doesn't support PUB/SUB broadcasting like say rabbit-MQ does, which is necessary for events to work properly. SNS was put in place to support broadcasting, but its a separate system.
Some libraries/packages out there are using SimpleDB as a messaging model store as a hack on top of SQS to emulate proper AMQP behavior, but apparently celery does not have a full hack in place yet.