Why doesn't CeleryCAM work with Amazon SQS? - django

I'm using Celery 2.4.6 and django-celery 2.4.2.
When I configure Celery to use Amazon SQS per the resolution on this question: Celery with Amazon SQS
I don't see anything in the celerycam table in the Django admin. If I switch back to RabbitMQ, the tasks start showing up again.
I have a lot (now 40+) queues in SQS named something like this: "celeryev-92e068c4-9390-4c97-bc1d-13fd6e309e19", which look like they might be related (some of the older ones even have an event in them), but nothing's showing up in the database and I see no errors in the celerycam log.
Any suggestions on what the issue might be or how to debug this further would be much appreciated.

SQS is a limited implementation of an AMQP bus. As I understand, it doesn't support PUB/SUB broadcasting like say rabbit-MQ does, which is necessary for events to work properly. SNS was put in place to support broadcasting, but its a separate system.
Some libraries/packages out there are using SimpleDB as a messaging model store as a hack on top of SQS to emulate proper AMQP behavior, but apparently celery does not have a full hack in place yet.

Related

How can I receive push notifications from Amazon SNS using an event on CentOS, running Wildfly/JBoss?

I intend to write software that posts daily feeds using SubmitFeed, and while planning to do so, I have seen in the documentation that I get some response from Amazon, possibly way before the actual parsing is complete. When I know that the operation has been completed, I need to call GetFeedSubmissionResult, however, the problem is that I need to find out somehow when the submission has finished. I could poll using GetFeedSubmissionList until the status is complete, but this would waste resources and is hacky. The way I would like to go is to use Amazon SNS and get notifications from FeedProcessingFinishedNotification.
However, I don't know how I could use Amazon SNS. Even though I read into the docs, I don't really know how I could use this. I suppose that something would need to run in my Linux CentOS or my Wildfly/Jboss which would "see" that a message has arrived and as a result would trigger the execution of a code I would intend to execute when such a push notification arrives. However, I do not know how I need to do this. How can I properly receive Amazon SNS push notifications at my Linux CentOS and Wildfly/Jboss so a custom Java code I write would be executed?
P.S.
This is a link which deals with RedHat and Maven: https://access.redhat.com/documentation/en-us/red_hat_jboss_fuse/7.0-tp/html/apache_camel_component_reference/aws-sns-component
However, reading it it's not clear to me how I can receive messages from Amazon, like an order has been placed on a product.
This article about CLI: https://docs.aws.amazon.com/cli/latest/userguide/cli-services-sns.html
describes how to subscribe using the email protocol. Reading about subscription protocols I have found this article: https://docs.aws.amazon.com/sns/latest/api/API_Subscribe.html
It seems that if I choose an HTTPS address, then the messages would be requests to that address. I'm really confused about this.

Custom metrics from celery workers into prometheus

I have a few celery workers running in containers under kubernetes. They are not auto-scaled by celery and each run in a single process (i.e. no multiprocessing). I would like to get a bunch of different metrics from them into prometheus. I've looked at celery-prometheus-exporter (unmaintained) and celery-exporter but they are focused on metrics at celery level rather than app metrics inside of the celery workers.
It looks like two options would be either to find some hacky way to get app level metrics to celery-prometheus-exporter which then would make them available to prometheus OR to use pushgateway.
Which is better, or maybe there's another option I missed?
Just use the default client and let it run the http server in a thread.

How to record all tasks information with Django and Celery?

In my Django project I'm using Celery with a RabbitMQ broker for asynchronous tasks, how can I record the information of all of my tasks (e.g. created time (task appears in queue), worker consume task time, execution time, status, ...) to monitor how Celery is doing?
I know there are solutions like Flower but that seems to much for what I need, django-celery-results looks like what I want but it's missing a few information I need like task created time.
Thanks!
It seems like you often find the answer yourself after asking on SO. I settled with using celery signals to do all the recording I want and store the results in a database table.

Django Job queue for interfacing with celery

My django web-app logic is heavily geared towards background task execution (both periodic as well as stand alone, synchronous as well as asynchronous). All the research seems to point to using Celery being the most recommended approach. I plan to eventually deploy on Heroku and the fact that it has support for Celery + Redis (what I'm using for local development) is a big plus for me.
However I need more extensive scheduling capabilities than celery provides. I need some of my periodic tasks to be able to run schedules like 'run on last sun of the month' etc. So I've implemented my own models in django to store a recurrence rule and other needed parameters.
Now I'm stumped with how to interface my tables with celery. Ideally what I'd like to do is to have my own Job model which has the schedule, the task which should be run when it becomes due as well as the parameters for the task. Sort of like function ptr in C++. Then I would run a daemon which keeps checking the job queue for which job has become due, if its periodic it creates the next job instance and pushes it into queue, then runs the associated task with parameters using celery's delay method or similar.
questions:
Does this approach even make sense?
If not what other alternative approach(es) can I use
If yes how do I go about designing that Job/Event queue...
I'd love to hear a better approach to doing this or if there's an existing implementation of a job queue that might be suitable or a way to use celery's job queue itself...
Thanks heaps..
The periodic tasks in Celery works pretty much like this. There's a dedicated scheduler process (celery beat) which simply sends off tasks when they are due.
You can also create new schedulers to use with beat by subclassing the celery.beat.Scheduler class, and you can create custom schedules too (like the crontab schedule that is already built-in) by subclassing celery.schedules.schedule.
There's a database-backed scheduler implementation in the django-celery extension (djcelery.schedulers.DatabaseScheduler), which uses many tricks to avoid too frequent polling of the database and so on (sadly it's not well commented).
Scheduler: https://github.com/celery/celery/tree/master/celery/beat.py
schedules: https://github.com/celery/celery/tree/master/celery/schedules.py
DatabaseScheduler: https://github.com/celery/django-celery/tree/master/djcelery/schedulers.py

Simulating Google Appengine's Task Queue with Gearman

One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.