PDI - How to monitor kettle Transformation and Jobs? - kettle

I'm try to create web app to monitor my transformation and job. I will show all the status (begin datetime, run time duration, finish datetime, status, etc) on web app in live(my web app will refresh automatically to check the status). Is there any way to collect the log of transformation and job? My idea is to use that log for my web app. Or any other way that Could be better than mine?

In https://github.com/alaindebecker/ETL-pilot, you'll learn how to use how to display the status of your transformation on a web site (which may be your local host).
It has been tested in the UN, and with Cédric we have found a way to do it at job level and how to implement a button to restart a transformation. We did not finish (and publish) this work not because we were lazy, and because there were no demand for it.
If you want to talk about you need, drop a issue in the git.

Related

Implement a background service that performs a specific data query once a pre-set period

I am developing a web application using Flask and Celery.
I wish to implement a feature/service that
Start on background when the flask app is started.
Perform a specific data query based on a preset frequency and store the queried data in time order.
Plot a data vs. time graph based on the stored data.
Shut down when the flask app shut down.
I guess I may use celery to write an asynchronous task to achieve this, however, I have no idea specifically how.
May I get some hints about how to achieve this? Many thanks!

Best option to schedule payments: azure scheduler, WebJob or Azure Functions or a Worker Role?

I've hosted my website on azure and now I want to schedule payments on a monthly basis. I am using Authorize.net for payments but I cannot use their recurring billing feature as it gives very little control. I've to perform checks in the database, make payments and update records. What should I use Azure Scheduler, Azure WebJob or Azure Functions a Worker Role?
Definitely not a Worker Role. They are very heavyweight and generally not worth the effort for a single, simple job like this.
Web jobs might be a good solution. It can run in the context of your web app, so you can use this with no additional cost. But you'll need to do some development with this - you have to create an app that calls Authorize.net.
If you only need to fire a single HTTP request, then using Azure Scheduler to schedule this HTTP action might be a good choice. You can configure the request itself (headers, payload) and it has error handling as well. But you might have to store sensitive information in the Azure portal, in the configuration of the scheduled job.
So I'd say forget about the Worker Role, then weigh simplicity against flexibility and development effort. That being sad, I would probably try it with the scheduler, and then move on to the WebJob, if I encounter something that is not feasible with the scheduler.
Edit:
Azure Functions can also be a good option - I'd say it's sort of a middle ground between the webjob and the simple scheduled option. It is part of the app services featureset, so it can run in the same appservice plan as the web app, so no costs. But here you have to code the http request to Authorize.net yourself as well. But Azure Functions is a lot more lightweight compared to webjobs - you do not have to create an exe (or ps script or whatever), you can just code the http request in a script editor inside the Azure portal. But you still have to do it yourself. This is a bit more flexible than the simple scheduled option though, which is something to consider when it comes to error handling.
So this is a good middleground, but I think it's still a lot of work given the complexity of the task (which is to fire a single HTTP request).
To get it working quickly, Logic Apps is a good choice. With Logic Apps, you can trigger it with a timer based on schedule you defined, use the out-of-box SQL/DocDB (depending on your exact scenario) to connect to your database. Although there's currently no Authorize.net connector available, you should be able to use the generic HTTP action to talk to its RESTful APIs. Most likely, you should be able to get this working very quickly. I'd also recommend submit a suggestion on aka.ms/logicapps-wish so we can track the request for Authorize.net connector, when available, is going to make this ever easier.

How to configure Sitecore processing server?

I just installed Sitecore Experience Platform and configured it according to the Sitecore scaling recommendations for processing servers.
But I want to know the following things:
1.How can I use the sitecore processing server?
2.How can I check whether processing server is working fine?
3.How collections DB data is processed and send to reporting server?
The processing server is a piece of the whole analytics (xDB) part of the Sitecore solution. More info can be found here.
Snippet:
"The processing and aggregation component extracts information from
captured, raw analytics data and transforms it into a form suitable
for use in reporting applications. It also performs specific tasks on
the collection database that involve mass updates.
You implement processing and aggregation on a Sitecore application
server connected to both the collection and reporting databases. A
processing server can run independently on a dedicated server, or on
the same server together with other Sitecore components. By
implementing multiple processing or aggregation servers, it is
possible to achieve higher performance on high-traffic solutions."
In short: the processing server will aggregate the data in Mongo and processes it (to the reporting database). This can be put on a separate server in order to spare resources on your other servers. I'm not quite sure what it all does behind the scenes and how to check exactly and only that part of the process, but you could check the the reporting tools in the Sitecore backend, like Experience Analytics. If those are working, you probably are fine. Also, check the logs on the processing server - that will give you an indication what he is doing and if any errors occur.

Perform large batch of request to webservice from web application. Monitor progress

I am building a web application with PHP/MySQL using the Yii framework. Key aspect of this application is to administer a large amount of entities; and perform large batches of requests to a SOAP webservice in order to update credit on these entities (cards).
I need to implement some sort of queue to manage the process of performing the batch (+/- 2000) requests.
I cannot figure out what is the best way to go;
A background job; how to implement this with php/yii. How to give feedback to the user
Implement a AJAX queue? Any best practices for this? Risk of interruption by closing browser?
I had a similar issue. The best way is to perform this batch in the background process. To give feedback to users, you have to write current state into your DB (e.g. into the table "batch_status"). And when user wants to see current situation, you can just retrieve data from your table. If you have some problems with implementation, you are welcome to ask me a question about it in comments ;)
To run background process in php append to your command ' > /dev/null & echo $!'. Then execute your command: $lastLine = exec($cmd, $output, $return_var);. After that you get process ID in $lastLine variable. How to use CLI in YII you can find out here: http://www.yiiframework.com/doc/guide/1.1/en/topics.console

How to best launch an asynchronous job request in Django view?

One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree