Do I need "RunAndBlock" for scheduled web jobs? - azure-webjobs

My intention is to run a 3 second web job every 5 min. What happens if I skip the host.RunAndBlock?

If you just want a simple time scheduled job, there is no need to use the WebJobs SDK at all, so there is no host at all. Just use a plain console app (can be as simple as a one line Main), and deploy it as a scheduled CRON WebJobs. See https://learn.microsoft.com/en-us/azure/app-service/web-sites-create-web-jobs.

Related

AWS service for doing jobs

I have the following need - the code needs to call some APIs, get some data, and store them in a database (flat file will do for our purpose). As the APIs give access to a huge number of records, we want to split it into 30 parts, each part scraping a certain section of the data from the APIs. We want these 30 scrapers to run in 30 different machines - and for that, we have got a Python program that does the following:
Call the API, get the data, based on parameters (which part of the API to call)
Dump it to the local flatfile.
And then later, we will merge the output from the 30 files into one giant DB.
Question is - which AWS tool to use for our purpose? We can use EC2 instance, but we have to keep the EC2 console open on our desktop where we connect to it to run the Python program, it is not feasible to keep 30 connections open on my laptop. It is very complicated to get remote desktop on those machines, so logging there, starting the job and then disconnecting - this is also not feasible.
What we want is this - start the tasks (one each on 30 machines), let them run and finish by themselves, and if possible notify me (or I can myself check for health periodically).
Can anyone guide me which AWS tool suits our purpose, and how?
"We can use EC2 instance, but we have to keep the EC2 console open on
our desktop where we connect to it to run the Python program"
That just means you are running the script wrong, and you need to look into running it as a service.
In general you need to look into queueing up these tasks in SQS and then triggering either EC2 auto-scaling or Lambda functions depending on if your script will run inside the Lambda runtime restrictions.
This seems like a good application for Step Functions. Step Functions allow you to orchestrate multiple lambda functions, Glue jobs, and other services into a business process. You could write lambda functions that call the API endpoints and store the results in S3. Once all the data is gathered, your step function could trigger a lambda function, glue job, or something else that processes the data into your database. Step Functions help with error handling and retry and allow easy monitoring of your process.

Scheduling Dataflow pipelines

I want to schedule a google dataflow job to run every one hour
I check this url https://cloud.google.com/blog/big-data/2016/04/scheduling-dataflow-pipelines-using-app-engine-cron-service-or-cloud-functions
but I got many errors.
How can I achieve this?
From my perspective, using app engine is trying to repurpose a good tool for something different.
We opted to run our own CRON instance.
Please check doing such case using google dataflow windowing with unbounded source
https://cloud.google.com/dataflow/model/windowing
https://cloud.google.com/dataflow/examples/gaming-example
You can Use a Cloud scheduler that runs every 1 hour and calls a cloud function,
The Cloud function will use the Dataflow client API library to submit a Dataflow job.
Check this link https://dzone.com/articles/triggering-dataflow-pipelines-with-cloud-functions

schadule task in DSS 3.5 for DSS project box car

I created a Data service project and enabled Boxcar for running 5 queries sequentially.
after deploying service, I need to use schedule task for running it every 5 minutes. in schedule task, I selected _request_box operation(It was created by DSS boxcar) but it doesn't work. how can i use task schedule with boxcarring?
Thank you
When a task is scheduled the operation should be a parameter-less operation. As request_box consists of several other operations, this scenario will not work as a normal operation. I have added a JIRA to report this scenario and you can track the progress from there.

Trigger an URL at a specific time (AWS)

Any ideas on how to reliably trigger an URL (web service) at a specific time? With the precision in seconds? For example, the script will be set so that it will be able to trigger a web service at 2015-05-27 12:34:55. In my scenario, the user will be able to select at what time, down to seconds a trade should execute. The web service must be then triggered at a specific time
AWS Lambda is not able to run at specific times.
Cron jobs won't work as it does not run every second
An SQS might work but coding it up to be reliable could be hard.
Thanks!
"at" command does what you need: https://calomel.org/cron_at.html
An addtitional tool one can use is called "at" and is used to execute a job only once. "at" is very useful, for example if you want run a backup job starting at 8pm and you expect to be leaving at 5:30pm.

Django setting up a scheduled task without Cron

I know there are many questions asking about this, especially this one: Django - Set Up A Scheduled Job?.
But what I want to understand is, how does a scheduled task inside Django actually works?
My simplistic way to think about it is that there's an infinite loop somewhere, something like this (runs every 60 seconds),
import time
interval=60 #60 seconds
while True:
some_method()
time.sleep(interval)
Question: where do you put this infinite loop? Is there some part of the Django app that just runs in the background alongside the rest of the app?
Thanks!
Django doesn't do scheduled tasks. If you want scheduled tasks, you need a daemon that runs all the time and can launch your task at the appropriate time.
Django only runs when a http request is made. If no one makes a http request for a month, django doesn't run for a month. If there are 45 http requests this second, django will run 45 times this second (in the absence of caching).
You can write scripts in the django framework (called management commands) that get called from some outside service (like cron). That's as close as you'll get to what you want. If that's the case, then the question/answer you reference is the place to get the how tos.
Probably on a unixy system, cron is the simplest outside service to work with. On recent linux systems, cron has a directory /etc/cron.d into which you can drop your app's cron config file, and it will not interfere with any other cron jobs on the system. No editing of existing files necessary.