What is the technology which allows the web application to process the task in the background without holding user to wait until the task to finish.
Example, as a user,
1. I want to submit a form which requires heavy processing. (Assume it requires to checking or actions, upload documentation or etc)
2. After submitting the form, the task will be running in the background, then I can go to other page and do something else.
2.1 At the same time, I might submit another form to the server.
The request can be process at the same time or can be queue under a queue system
3. I will receive a notification from the system whenever the server return a response. (Regardless it is success or failure)
This feature is similar to Google Cloud Platform.
Try Kue or any other similar libraries. The term to "google" is "[language] task queue"
You can of course roll your own. Though it will be much easier if you make use of an existing server such as redis or rabbitmq. So that queuing part is handled for you by the server and you could concentrate on your business logic.
Related
We've got a little java scheduler running on AWS ECS. It's doing what cron used to do on our old monolith. it fires up (fargate) tasks in docker containers. We've got a task that runs every hour and it's quite important to us. I want to know if it crashes or fails to run for any reason (eg the java scheduler fails, or someone turns the task off).
I'm looking for a service that will alert me if it's not notified. I want to call the notification system every time the script runs successfully. Then if the alert system doesn't get the "OK" notification as expected, it shoots off an alert.
I figure this kind of service must exist, and I don't want to re-invent the wheel trying to build it myself. I guess my question is, what's it called? And where can I go to get that kind of thing? (we're using AWS obviously and we've got a pagerDuty account).
We use this approach for these types of problems. First, the task has to write a timestamp to a file in S3 or EFS. This file is the external evidence that the task ran to completion. Then you need an http based service that will read that file and calculate if the time stamp is valid ie has been updated in the last hour. This could be a simple php or nodejs script. This process is exposed to the public web eg https://example.com/heartbeat.php. This script returns a http response code of 200 if the timestamp file is present and valid, or a 500 if not. Then we use StatusCake to monitor the url, and notify us via its Pager Duty integration if there is an incident. We usually include a message in the response so a human can see the nature of the error.
This may seem tedious, but it is foolproof. Any failure anywhere along the line will be immediately notified. StatusCake has a great free service level. This approach can be used to monitor any critical task in same way. We've learned the hard way that critical cron type tasks and processes can fail for any number of reasons, and you want to know before it becomes customer critical. 24x7x365 monitoring of these types of tasks is necessary, and helps us sleep better at night.
Note: We always have a daily system test event that triggers a Pager Duty notification at 9am each day. For the truly paranoid, this assures that pager duty itself has not failed in some way eg misconfiguratiion etc. Our support team knows if they don't get a test alert each day, there is a problem in the notification system itself. The tech on duty has to awknowlege the incident as per SOP. If they do not awknowlege, then it escalates to the next tier, and we know we have to have a talk about response times. It keeps people on their toes. This is the final piece to insure you have robust monitoring infrastructure.
OpsGene has a heartbeat service which is basically a watch dog timer. You can configure it to call you if you don't ping them in x number of minutes.
Unfortunately I would not recommend them. I have been using them for 4 years and they have changed their account system twice and left my paid account orphaned silently. I have to find a new vendor as soon as I have some free time.
I have now succesfully setup Django-celery to check after my existing tasks to remind the user by email when the task is due:
#periodic_task(run_every=datetime.timedelta(minutes=1))
def check_for_tasks():
tasks = mdls.Task.objects.all()
now = datetime.datetime.utcnow().replace(tzinfo=utc,second=00, microsecond=00)
for task in tasks:
if task.reminder_date_time == now:
sendmail(...)
So far so good, however what if I wanted to also display a popup to the user as a reminder?
Twitter bootstrap allows creating popups and displaying them from javascript:
$(this).modal('show');
The problem is though, how can a celery worker daemon run this javascript on the user's browser? Maybe I am going a complete wrong way and this is not possible at all. Therefore the question remains can a cronjob on celery ever be used to achieve a ui notification on the browser?
Well, you can't use the Django messages framework, because the task has no way to access the user's request, and you can't pass request objects to the workers neither, because they're unpickable.
But you could definitely use something like django-notifications. You could create notifications in your task and attach them to the user in question. Then, you could retrieve those messages from your view and handle them in your templates to your liking. The user would see the notification on their next request (or you could use AJAX polling for real-time-ish notifications or HTML5 websockets for real real-time [see django-websocket]).
Yes it is possible but it is not easy. Ways to do/emulate server to client communication:
polling
The most trivial approach would be polling the server from javascript. Your celery task could create rows in your database that can be fetched by a url like /updates which checks for new updates, marks the rows as read and returns them.
long polling
Often referred to as comet. The client does a request to the server which pends until the server decides to return something. See django-comet for example.
websocket
To enable true server to client communication you need an open connection from the client to the server. django-socketio and django-websocket are examples of reusable apps that make this possible.
My advice judging by your question's context: either do some basic polling or stick with the emails.
I plan to use celery to process incoming web service requests. I understand that celery is used mostly to process asynchronous tasks. However celery has lot of features that I like and could benefit from in my project - priorities, rate limits, distributed architecture etc.
I am just struggling with the design. I would like to have web service that creates and starts the task that will call subtasks. Original task needs results from the subtasks and then when original task is finished I return result back to the client through web service. I know I could call tasks synchronously but that it is not a good practice.
Thanks,
The scatter/gather thing looks like it could be a map/reduce job. If the mapreduce part is important to you, go with a specialised framework like Disco or Hadoop. Otherwise, you need some kind of completion signal, so that you can fire a reply to the user once all subtasks are done or cancelled. For example, a counter of how many subtasks are yet to terminate. The subtask that brings the counter to zero can push a new reply task that pushes a reply to the user and closes the circle.
Look at Mongrel2, an asynchronous web framework, for an example of this kind of circular request path.
One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.
In my Django app, I need to implement this "timer-based" functionality:
User creates some jobs and for each one defines when (in the same unit the timer works, probably seconds) it will take place.
User starts the timer.
User may pause and resume the timer whenever he wants.
A job is executed when its time is due.
This does not fit a typical cron scenario as time of execution is tied to a timer that the user can start, pause and resume.
What is the preferred way of doing this?
This isn't a Django question. It is a system architecture problem. The http is stateless, so there is no notion of times.
My suggestion is to use Message Queues such as RabbitMQ and use Carrot to interface with it. You can put the jobs on the queue, then create a seperate consumer daemon which will process jobs from the queue. The consumer has the logic about when to process.
If that it too complex a system, perhaps look at implementing the timer in JS and having it call a url mapped to a view that processes a unit of work. The JS would be the timer.
Have a look at Pinax, especially the notifications.
Once created they are pushed to the DB (queue), and processed by the cron-jobbed email-sending (2. consumer).
In this senario you won't stop it once it get fired.
That could be managed by som (ajax-)views, that call system process....
edit
instead of cron-jobs you could use a twisted-based consumer:
write jobs to db with time-information to the db
send a request for consuming (or resuming, pausing, ...) to the twisted server via socket
do the rest in twisted
You're going to end up with separate (from the web server) processes to monitor the queue and execute jobs. Consider how you would build that without Django using command-line tools to drive it. Use Django models to access the the database.
When you have that working, layer on on a web-based interface (using full Django) to manipulate the queue and report on job status.
I think that if you approach it this way the problem becomes much easier.
I used the probably simplest (crudest is more appropriate, I'm afraid) approach possible: 1. Wrote a model featuring the current position and the state of the counter (active, paused, etc), 2. A django job that increments the counter if its state is active, 3. An entry to the cron that executes the job every minute.
Thanks everyone for the answers.
You can always use a client based jquery timer, but remember to initialize the timer with a value which is passed from your backend application, also make sure that the end user didn't edit the time (edit by inspecting).
So place a timer start time (initial value of the timer) and timer end time or timer pause time in the backend (DB itself).
Monitor the duration in the backend and trigger the job ( in you case ).
Hope this is clear.