I'm building a jewellery e-commerce store, and a feature I'm building is incorporating current market gold prices into the final product price. I intend on updating the gold price every 3 days by making calls to an api using a scheduler, so the full process is automated and requires minimal interaction with the system.
My concern is this: will a scheduler, with one task executed every 72 hrs, slow down my server (using client-server model) and affect its performance?
The server side application is built using Django, Django REST framework, and PostgresSQL.
The scheduler I have in mind is Advanced Python Scheduler.
As far as I can see from the docs of "Advanced Python Scheduler", they do not provide a different process to run the scheduled tasks. That is left up to you to figure out.
From their docs, they are recommending a "BackgroundScheduler" which runs in a separate thread.
Now there are multiple issues which could arise:
If you're running multiple Django instances (using gunicorn or uwsgi), APS scheduler will run in each of those processes. This is a non-trivial problem to solve unless APS has considered this (you will have to check the docs).
BackgroundScheduler will run in a thread, but python is limited by the GIL. So if your background tasks are CPU intensive, your Django process will get slower at processing incoming requests.
Regardless of thread or not, if your background job is CPU intensive + lasts a long time, it can affect your server performance.
APS seems like a much lower-level library, and my recommendation would be to use something simpler:
Simply using system cronjobs to run every 3 days. Create a django management command, and use the cron to execute that.
Use django supported libraries like celery, rq/rq-scheduler/django-rq, or django-background-tasks.
I think it would be wise to take a look at https://github.com/arteria/django-background-tasks as it is the simplest of all with the least amount of setup required. Once you get a bit familiar with this you can weigh the pros & cons on what is appropriate for your use case.
Once again, your server performance depends on what your background task is doing and how long does it lasts.
Related
I'm making a web app using Django.
I'd like events to trigger 'background' tasks that run parallel to the Django application. (by parallel I just mean that they don't impact the speed of the users experience)
Types of tasks I'm talking about
a user logs in and an event is trigged to start populating that users cache in anticipation of future requests based on their use habits.
a user posts some data to the database but that post triggers an api call to another website where the returned data will be parsed, aggregated and used to supplement that users post.
rolling updates of data used in the app through api calls to other websites
aggregating data and running general maintenance tasks.
After a few days of research I'm thinking that I should use twisted to accomplish this. Which has lead me to my question:
Is twisted overkill for what I'm trying to accomplish?
Many of these tasks are far more i/o bound than cpu bound. So I'm thinking asynchronous is best.
Anyone advice would be appreciated.
Thank you
Yes, I think it's overkill.
Rather than fold in a full async framework such as Twisted, with all the technical overhead that brings, you might be better off using a task queue to be able to do what you want as a background process.
When your app needs to do a background task (anything that would otherwise block the request/response cycle), put the task in the queue and let a separate worker process pick things off the queue and deal with them as fast as it can. (You can always add more workers).
Two of the most popular queue libraries for Python/Django are celery and rq. They're especially good with Redis as a backend, but there are other backend options, too.
Personally, I much prefer rq over celery, in terms of its API and its clean setup, but both are used by a lot of people.
And both are definitely easier to get your head around than something like Twisted, IMO.
I have these tables
exam(id, start_date, deadline, duration)
exam_answer(id, exam_id, answer, time_started, status)
exam_answer.status possible values are 0-not yet started 1-started 2-submitted
Is there a way to update exam_answer.status once now - exam_answer.time_started is greater than exam.duration? Or if it is already past deadline?
I'll also mentioning this if it might help me better, I'm building this for a django project.
Django applications, like any other WSGI/web application, are only meant to handle request-response flows. If there aren't any requests, there is no activity and such changes will not happen.
You can either write a custom management command that's executed periodically by a cron job, but you run into the risk of possibly displaying incorrect data. You have elegant means at your disposal to compute the statuses before any related views start their processing, but this might be potentially a wasteful use of resources.
Your best bet might be to integrate a task scheduler with your application, such as Celery. Do not be discouraged because Celery seemingly runs in a concurrent multiprocess environment across several machines--the service can be configured to run in a single-thread and it provides a clean interface for scheduling such tasks that have to run at some exact point in the future.
Let's say Im making a crawler / scraper in clojure, and I want it to run periodically (at predefined times of day).
I want to define my jobs with quartz / quartzite (at least that seems to be the most robust solution.)
Now, to create a daemon process with clojure, I tried lein-daemon plugin, but it seems that it is a pretty risky endevour, since the plugin seems more than a bit buggy (or I am making some heavy mistakes)
What is the best way for me to create this service?
I want it to be able to restart itself upon system reboot, but I want to use clojure (quartzite) for my jobs (loading them from database, etc).
What is the robust - but clojury - way to create a long-running, daemon process?
EDIT:
The deployment environment will be something like a single VPS or a dedicated server.
There may be a dozen jobs loading their parameters from some data store, running anywhere from 1 - 8 times a day (or maybe more).
The correct process depends a lot on your environment. I work on deployment systems for complex web/mobile infrastructure with many long running Clojure processes. For this we use Pallet to create instances with the code checked out and configured, then we have a function that generates init scripts to start the services at boot. This process is appropriate to environments where you need a repeatable build on a cloud provider so it may be too heavy for your situation.
If you are looking for simple recurring jobs you may want to look into Immutant which is an application server for Clojure with good support for recurring jobs.
I'm not sure if the topic is appropiate to the question... Anyway, suppose I've a website, done with PHP or JSP and a cheap hosting with limited functionalities. More precisely I don't own the server and I can't run daemons or services at my will. Now I want to run periodic (say every minute or two) tasks to do certain statistics. They are likely to be time consuming so I just can't repeat the calculation for every user accessing a page. I could do the calculation once when a user loads a page and I calculate that enough time has passed, but in this situation if the calculation gets very long the response time may be excessive and timeout (it's not likely I don't mean to run such long task, but I'm considering the worst case).
So given these costraints, what solutions would you suggest?
Each cheap hosting will have support of crontabs. Check out the hosting packages first.
If not, load in the page, and launch the task by AJAX. This way your response time doesn't suffer and you do in a different thread the work.
If you choose to use crontab, you're going to have to know a bit more to execute your PHP scripts from them. Depending on if your PHP executes as CGI or an apache module has an effect. There's a good article on how to do this at:
http://www.htmlcenter.com/blog/running-php-scripts-with-cron/
If you don't have access to crontab on your hosting provider (find a new one) there are other options. For example:
http://www.setcronjob.com/
Will call a script on your site, remotely every X period .. you have to renew it once a month I think. If you take their paid service ($5/year according to the front page) I think the jobs you set up last until you cancel them or your paid term runs out.
In Java, you could just create a Timer. It can create a background thread that will perform a given function every so often.
My preference would be to start the Timer object in a ContextListener.contextInitialized() method, but you may have a more appropriate place.
One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree