Implement a background service that performs a specific data query once a pre-set period - flask

I am developing a web application using Flask and Celery.
I wish to implement a feature/service that
Start on background when the flask app is started.
Perform a specific data query based on a preset frequency and store the queried data in time order.
Plot a data vs. time graph based on the stored data.
Shut down when the flask app shut down.
I guess I may use celery to write an asynchronous task to achieve this, however, I have no idea specifically how.
May I get some hints about how to achieve this? Many thanks!

Related

Which is the best way to retrieve data from a remote server using concurrent calls?

I'm working on retrieving data like Products, Orders eCommerce platforms such as BigCommerce, Shopify, etc., and save it in our own databases. To improve the data retrieval speed from their APIs, we're planning to use the Bluebird library.
Earlier, the data retrieval logic was like retrieving one page at a time. Since we're planning to make concurrent calls "n" number of pages will be retrieved concurrently.
For example, Bicommerce allows us to make up to 3 concurrent calls at a time. So, we need to make the concurrent calls so that we will not retrieve the same page more than once, and in case if a request failed then a request for that page will be resent.
What's the best way to implement this? One idea that strikes my mind is,
One Possible Solution - Keep an index of ongoing requests in the database and update it on the API completion, so we will know which are unsuccessful.
Is there a better way of doing this? Any suggestions/ideas on this would be highly appreciated.

PDI - How to monitor kettle Transformation and Jobs?

I'm try to create web app to monitor my transformation and job. I will show all the status (begin datetime, run time duration, finish datetime, status, etc) on web app in live(my web app will refresh automatically to check the status). Is there any way to collect the log of transformation and job? My idea is to use that log for my web app. Or any other way that Could be better than mine?
In https://github.com/alaindebecker/ETL-pilot, you'll learn how to use how to display the status of your transformation on a web site (which may be your local host).
It has been tested in the UN, and with Cédric we have found a way to do it at job level and how to implement a button to restart a transformation. We did not finish (and publish) this work not because we were lazy, and because there were no demand for it.
If you want to talk about you need, drop a issue in the git.

Django - Update a Model every couple of minutes

I am building an app, where I need to fetch some data from an API and update all the models with that data every few minutes.
What would be a clean way to accomplish something like this?
Well, it's a quite open question.
You'll need to create a task that runs every few minutes, you can do this with Celery. Celery has a task schedluer http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html wich will launch a certain function at a configured time similar to a crontab
The task then would fetch the data, http://docs.python-requests.org/en/master/ is a very good library to make http requests.
And lastly but no less important you would need to serialize the fetched data and save it to your model. Django rest framework serializing capabilities are a great starting point, but if data structure is simple enough you can just use JSON python library json.loads(data) and create a function that translate the fields on the API to the fields of the model.
By the way, I'm supposing a REST API.
You can use a task management tool that has the feature of running periodic tasks in the intervals you specify, like Periodic Tasks in Celery.
Also, If you run your code on a Unix-like system, you can stick with the core django functionality. Just write your functionality as a Django Management Command and set a cronjob to run it in your preferred interval.

Perform large batch of request to webservice from web application. Monitor progress

I am building a web application with PHP/MySQL using the Yii framework. Key aspect of this application is to administer a large amount of entities; and perform large batches of requests to a SOAP webservice in order to update credit on these entities (cards).
I need to implement some sort of queue to manage the process of performing the batch (+/- 2000) requests.
I cannot figure out what is the best way to go;
A background job; how to implement this with php/yii. How to give feedback to the user
Implement a AJAX queue? Any best practices for this? Risk of interruption by closing browser?
I had a similar issue. The best way is to perform this batch in the background process. To give feedback to users, you have to write current state into your DB (e.g. into the table "batch_status"). And when user wants to see current situation, you can just retrieve data from your table. If you have some problems with implementation, you are welcome to ask me a question about it in comments ;)
To run background process in php append to your command ' > /dev/null & echo $!'. Then execute your command: $lastLine = exec($cmd, $output, $return_var);. After that you get process ID in $lastLine variable. How to use CLI in YII you can find out here: http://www.yiiframework.com/doc/guide/1.1/en/topics.console

How to best launch an asynchronous job request in Django view?

One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree