Update a table after x minutes/hours - django

I have these tables
exam(id, start_date, deadline, duration)
exam_answer(id, exam_id, answer, time_started, status)
exam_answer.status possible values are 0-not yet started 1-started 2-submitted
Is there a way to update exam_answer.status once now - exam_answer.time_started is greater than exam.duration? Or if it is already past deadline?
I'll also mentioning this if it might help me better, I'm building this for a django project.

Django applications, like any other WSGI/web application, are only meant to handle request-response flows. If there aren't any requests, there is no activity and such changes will not happen.
You can either write a custom management command that's executed periodically by a cron job, but you run into the risk of possibly displaying incorrect data. You have elegant means at your disposal to compute the statuses before any related views start their processing, but this might be potentially a wasteful use of resources.
Your best bet might be to integrate a task scheduler with your application, such as Celery. Do not be discouraged because Celery seemingly runs in a concurrent multiprocess environment across several machines--the service can be configured to run in a single-thread and it provides a clean interface for scheduling such tasks that have to run at some exact point in the future.

Related

Do schedulers slow down your web application?

I'm building a jewellery e-commerce store, and a feature I'm building is incorporating current market gold prices into the final product price. I intend on updating the gold price every 3 days by making calls to an api using a scheduler, so the full process is automated and requires minimal interaction with the system.
My concern is this: will a scheduler, with one task executed every 72 hrs, slow down my server (using client-server model) and affect its performance?
The server side application is built using Django, Django REST framework, and PostgresSQL.
The scheduler I have in mind is Advanced Python Scheduler.
As far as I can see from the docs of "Advanced Python Scheduler", they do not provide a different process to run the scheduled tasks. That is left up to you to figure out.
From their docs, they are recommending a "BackgroundScheduler" which runs in a separate thread.
Now there are multiple issues which could arise:
If you're running multiple Django instances (using gunicorn or uwsgi), APS scheduler will run in each of those processes. This is a non-trivial problem to solve unless APS has considered this (you will have to check the docs).
BackgroundScheduler will run in a thread, but python is limited by the GIL. So if your background tasks are CPU intensive, your Django process will get slower at processing incoming requests.
Regardless of thread or not, if your background job is CPU intensive + lasts a long time, it can affect your server performance.
APS seems like a much lower-level library, and my recommendation would be to use something simpler:
Simply using system cronjobs to run every 3 days. Create a django management command, and use the cron to execute that.
Use django supported libraries like celery, rq/rq-scheduler/django-rq, or django-background-tasks.
I think it would be wise to take a look at https://github.com/arteria/django-background-tasks as it is the simplest of all with the least amount of setup required. Once you get a bit familiar with this you can weigh the pros & cons on what is appropriate for your use case.
Once again, your server performance depends on what your background task is doing and how long does it lasts.

Django background tasks with Twisted

I'm making a web app using Django.
I'd like events to trigger 'background' tasks that run parallel to the Django application. (by parallel I just mean that they don't impact the speed of the users experience)
Types of tasks I'm talking about
a user logs in and an event is trigged to start populating that users cache in anticipation of future requests based on their use habits.
a user posts some data to the database but that post triggers an api call to another website where the returned data will be parsed, aggregated and used to supplement that users post.
rolling updates of data used in the app through api calls to other websites
aggregating data and running general maintenance tasks.
After a few days of research I'm thinking that I should use twisted to accomplish this. Which has lead me to my question:
Is twisted overkill for what I'm trying to accomplish?
Many of these tasks are far more i/o bound than cpu bound. So I'm thinking asynchronous is best.
Anyone advice would be appreciated.
Thank you
Yes, I think it's overkill.
Rather than fold in a full async framework such as Twisted, with all the technical overhead that brings, you might be better off using a task queue to be able to do what you want as a background process.
When your app needs to do a background task (anything that would otherwise block the request/response cycle), put the task in the queue and let a separate worker process pick things off the queue and deal with them as fast as it can. (You can always add more workers).
Two of the most popular queue libraries for Python/Django are celery and rq. They're especially good with Redis as a backend, but there are other backend options, too.
Personally, I much prefer rq over celery, in terms of its API and its clean setup, but both are used by a lot of people.
And both are definitely easier to get your head around than something like Twisted, IMO.

(or maybe more)Clojure background process

Let's say Im making a crawler / scraper in clojure, and I want it to run periodically (at predefined times of day).
I want to define my jobs with quartz / quartzite (at least that seems to be the most robust solution.)
Now, to create a daemon process with clojure, I tried lein-daemon plugin, but it seems that it is a pretty risky endevour, since the plugin seems more than a bit buggy (or I am making some heavy mistakes)
What is the best way for me to create this service?
I want it to be able to restart itself upon system reboot, but I want to use clojure (quartzite) for my jobs (loading them from database, etc).
What is the robust - but clojury - way to create a long-running, daemon process?
EDIT:
The deployment environment will be something like a single VPS or a dedicated server.
There may be a dozen jobs loading their parameters from some data store, running anywhere from 1 - 8 times a day (or maybe more).
The correct process depends a lot on your environment. I work on deployment systems for complex web/mobile infrastructure with many long running Clojure processes. For this we use Pallet to create instances with the code checked out and configured, then we have a function that generates init scripts to start the services at boot. This process is appropriate to environments where you need a repeatable build on a cloud provider so it may be too heavy for your situation.
If you are looking for simple recurring jobs you may want to look into Immutant which is an application server for Clojure with good support for recurring jobs.

Using celery for data migration. Is it a good idea?

I am doing data migration which deals with images/videos and such being downloaded and then sent to dropbox by using its api.
I'm using python-django for the entire web app but I imagine this will take a lot of bandwidth and there might be lot of issues where a failure of one image being saved shouldn't stop the entire migration.
Thus, is celery a good idea? Or Twisted?
I'm a bit confused about how this would help me. What I've in mind is to spawn a server/thread for the process of dealing with a single image or a small set of images and thus being able to do it on multiple threads.
The short answer to your question "is Celery a good idea?" is "Yes". I've used Celery to achieve a similar process whereby user submission of a form initiates, amongst other things, asynchronous calls to the Twitter API which then write back to saved objects in my database. I've found Celery outstanding for this task (no pun intended).
Celery would allow you to initiate pre-defined tasks (which, in part, can be thought of as "normal" Python functions with a #task decorator added to them), each time a user indicates they'd like to download an image or images. Celery gives you granular, per-task control over errors and retries, and tasks can be submitted singly or as chains, chords, or groups, all of which means you can definitely achieve your requirement of migration continuing even when a single image fails to download.
I would recommend spending some time with the Celery tutorial here and the Celery-Django tutorial here, which will give you an introduction to the basic work flow with Celery and Django.
I can't speak to the merits of Twisted, but if you are looking for opinions on the relative strengths and weaknesses of each, these look like a good start:
Twisted or Celery? Which is right for my application with lots of
SOAP calls?
sync spawing of processes: design question - Celery or Twisted

How to best launch an asynchronous job request in Django view?

One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree