Perform large batch of request to webservice from web application. Monitor progress - web-services

I am building a web application with PHP/MySQL using the Yii framework. Key aspect of this application is to administer a large amount of entities; and perform large batches of requests to a SOAP webservice in order to update credit on these entities (cards).
I need to implement some sort of queue to manage the process of performing the batch (+/- 2000) requests.
I cannot figure out what is the best way to go;
A background job; how to implement this with php/yii. How to give feedback to the user
Implement a AJAX queue? Any best practices for this? Risk of interruption by closing browser?

I had a similar issue. The best way is to perform this batch in the background process. To give feedback to users, you have to write current state into your DB (e.g. into the table "batch_status"). And when user wants to see current situation, you can just retrieve data from your table. If you have some problems with implementation, you are welcome to ask me a question about it in comments ;)
To run background process in php append to your command ' > /dev/null & echo $!'. Then execute your command: $lastLine = exec($cmd, $output, $return_var);. After that you get process ID in $lastLine variable. How to use CLI in YII you can find out here: http://www.yiiframework.com/doc/guide/1.1/en/topics.console

Related

Implement a background service that performs a specific data query once a pre-set period

I am developing a web application using Flask and Celery.
I wish to implement a feature/service that
Start on background when the flask app is started.
Perform a specific data query based on a preset frequency and store the queried data in time order.
Plot a data vs. time graph based on the stored data.
Shut down when the flask app shut down.
I guess I may use celery to write an asynchronous task to achieve this, however, I have no idea specifically how.
May I get some hints about how to achieve this? Many thanks!

Which is the best way to retrieve data from a remote server using concurrent calls?

I'm working on retrieving data like Products, Orders eCommerce platforms such as BigCommerce, Shopify, etc., and save it in our own databases. To improve the data retrieval speed from their APIs, we're planning to use the Bluebird library.
Earlier, the data retrieval logic was like retrieving one page at a time. Since we're planning to make concurrent calls "n" number of pages will be retrieved concurrently.
For example, Bicommerce allows us to make up to 3 concurrent calls at a time. So, we need to make the concurrent calls so that we will not retrieve the same page more than once, and in case if a request failed then a request for that page will be resent.
What's the best way to implement this? One idea that strikes my mind is,
One Possible Solution - Keep an index of ongoing requests in the database and update it on the API completion, so we will know which are unsuccessful.
Is there a better way of doing this? Any suggestions/ideas on this would be highly appreciated.

PDI - How to monitor kettle Transformation and Jobs?

I'm try to create web app to monitor my transformation and job. I will show all the status (begin datetime, run time duration, finish datetime, status, etc) on web app in live(my web app will refresh automatically to check the status). Is there any way to collect the log of transformation and job? My idea is to use that log for my web app. Or any other way that Could be better than mine?
In https://github.com/alaindebecker/ETL-pilot, you'll learn how to use how to display the status of your transformation on a web site (which may be your local host).
It has been tested in the UN, and with Cédric we have found a way to do it at job level and how to implement a button to restart a transformation. We did not finish (and publish) this work not because we were lazy, and because there were no demand for it.
If you want to talk about you need, drop a issue in the git.

Running continually tasks alongside the django app

I'm building a django app which lists the hot(according to a specific algorithm) twitter trending topics.
I'd like to run some processes indefinitely to make twitter API calls and update the database(postgre) with the new information. This way the hot trending topic list gets updated asynchronously.
At first it seemed to me that celery+rabbitmq were the solution to my problem, but from what I understand they are used within django to launch scheduled or user triggered tasks, not indefinitely running tasks.
The solution that comes to my mind is write a .py file to continually put trending topics in a queue and write independent .py files continually running, making get queue requests and saving the data in the db used by django with raw SQL or SQLAlchemy. I think that this could work, but I'm pretty sure there is a much better way to do it.
If you just need to keep some processes running continually, supervisor is a nice solution.
You can combine it with any queuing technology you like to push things into your queues.

How to best launch an asynchronous job request in Django view?

One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree