Make Azure Webjob Timer Trigger Not Run as a Singleton - azure-webjobs

I have a set of worker functions that are spun up as needed to pull from Service Bus Topic subscriptions when they are created. New workers are created when a new Subscription is created by way of a provisioning message that is queued triggering a job to spin up the new worker to listen to the subscription. The problem is that now I want to be able to scale out the workers listening to the subscriptions when the app is scaled out. Since the provisioning job only creates the worker on a single instance the effectiveness of scaling out is significantly reduced.
My thought was to create a second provisioning job that runs from a timer trigger to synchronize the running jobs to the current list of subscriptions. I run into the same problem with the timer job as with the service bus trigger though because it is running as a singleton in the web job and likely will run on the same instance of the job each time it is run meaning I still will likely have one maybe 2 instances of a job per subscription no matter how much I scale out.
My question is, is it possible to create a timer job that is not run as a singleton? Meaning, can I configure a timer job that, for each instance of the scaled out web job, will run on a set interval?

Singleton attribute ensures to run only one instance with the help of distributed locking, these are related to webjobs SDK.
Also we have Singleton Listeners and with adding few settings to ensure that your function runs as a singleton on a single instance. To ensure that only a single instance of the function is running when the web app scales out to multiple instances, apply a listener-level singleton lock on the function ([Singleton(Mode = SingletonMode.Listener)]). Listener locks are acquired when the JobHost starts. If three scaled-out instances all start at the same time, only one of the instances acquires the lock and only one listener starts.
Now in case of not assigning singleton to avoid running each instance at a time, refer to Multiple Instances documentation from MS Docs

Related

How to terminate idle instances using ASG or lambda

We are having multiple test servers which are running idle and needs to terminate/stop to reduce cost
Scenario is we are using Jenkins to run a project where developers will build the project and everytime new server will spin up, if there is any stopped server due to the previous build it will first terminate the stopped one and then new instance will spin up.
I tried using
Cloudwatch CPU utilization function to stop/terminate idle instances but as every time a new instance ID will generate this option is out.
We are using ASG, but when I tried using simple step scaling to reduce it to zero it didn't work
I am looking to create lambda function so idle instance will stop/terminate without considering instance ID

Auto scaling service in AWS without duplicating cron jobs

I have a (golang web server) service running on AWS on a EC2 (no auto scaling). This service has a few cron jobs that runs throughout the day and these jobs starts when the service starts.
I would like to take advantage of auto scaling in some form on AWS. Been looking at ECS and Beanstalk.
When I add auto scaling I need the cron job to only execute on one of the scaled services due to rate limits on external APIs. Right now the cron job is tightly coupled within the service and I am looking for an option that does not require moving the cron job to its own service.
How can I achieve this in a good way using AWS?
You're going to get this problem as a general issue in any scalable application where crons cannot / should not run multiple times. It's not really AWS specific. I'm not sure to what extent you want to keep things coupled or how your crons are currently run but here are a few suggestions that might work for you:
Create a "cron runner" instance with a limit to run crons on
You could create a separate ECS service which has no autoscaling and a fixed value of 1 instance. This instance would run the same copy of your code as your "normal" instances and would run crons. You would turn crons off on your "normal" instances. You might find that this can be a very small instance since it doesn't handle any web traffic.
Create a "cron trigger" instance which fires off crons remotely
Here you create one "trigger" instance which sends a request to your normal instances through an ALB. Because your ALB will route the request to 1 of the servers behind it the cron only gets run once. One watch out with this is that if your cron is long running, you may need to consider your request timeouts. You'll also have to think about retries etc but I assume you already have a process that can be adapted for that.
The above solutions can be adapted with message queues etc but the basis of both is that there is another instance of some kind which starts the cron and is separate from your normal servers. Depending on when your cron runs, you may only need to run this cron instance for a few hours per day so it can be cost efficient to do things like this.
Personally I have used both methods in a multi-tenant application and I had to go with the option of running the cron like this due to the number of tenants and the time / resource it took to run the crons for all of them at once:
Cloudwatch schedule triggers a lambda which sends a message to SQS to queue a cron for each tenant individually.
Cron servers (totally separate from main web servers but running same / similar code) pull messages and run the cron for each tenant individually. Stores a key in redis for crons which are vital to only run once to stop issues with "at least once" delivery so crons don't run twice.
This can also help handle failures with retry policies and deadletter queues managed in SQS.
Ultimately you need to kick off these crons from one place. If possible, change up your crons so it doesn't matter if they run twice. It makes it easier to deal with retries and things like that.

AWS autoscale group scale in event

I am using autoscale group for adding and removing additional instances for my application. I am using CPU Utilization as my scaling parameter and wondering what happens when an instance is running a program and the CPU Utilization comes below 65% (i.e threshold value).
Does it wait for the instance to finish the program or terminate the instance at that moment? If it terminates the instance at that moment then it might lead to data loss/data inconsistency.
Any help would be appreciated.
If you're looking to prevent or delay an instance during a scale in event you could take a look at lifecycle hooks.
By enabling this autoscaling can send a notification that a specific instance action is about to occur (scale out or scale in). Using a combination of services (such as SNS, Lambda, SSM etc) you would be able to programmatically notify the instance that is is about to be terminated which you can then take any necessary actions.
The instance termination will wait until there is a confirmation to the autoscaling group that it has been completed which will lead to it being terminated. Additionally a lifecycle hook will have a timeout, if no confirmation is received by the time the timeout has been exceeded then the termination will still occur.
I think that you are looking for termination policy
Look at this link:
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#default-termination-policy
And in my experience, the instance will be terminated no matter what it is running
Does it wait for the instance to finish the program or terminate the instance at that moment.
Sadly, I does not wait. ASG works outside of your instances and is not concerned with any programs running on your instance.
Having said that, there are few things you can do, some of which are described in:
Controlling which Auto Scaling instances terminate during scale in
Generally speaking you should develop your applications to be stateless. This means the applications should be "aware" that they can be terminated at any time. One way to achieve is by using external storage systems, such as S3 or EFS, which will persist data between terminations.
Other way, is to use termination protection. In this case, the application will put its instance in this state at the beginning of processing, and then whey the calcuation finishes, the termination protection will be removed.

Elastic beanstalk periodic tasks on autoscaled environment

On an autoscaled environment running a periodic task, if the environment is scaled up, do the periodic tasks get run on each instance? Or more specifically, does each instance then post to the queue leading to multiple "periodic tasks" running?
Yes. If there's some periodic task that should only be triggered once, you should have a separate auto scale environment of minimum 1 maximum one instance to either perform the task or trigger it on one of your servers (maybe make a request to your load balancer and one of your instances will perform the task)
Yes, behind the screen it's just a cron job on all your instances. The default scenario for using periodic tasks is to read the tasks from the SQS queue on the worker nodes.
So yes, if you doing some kind of posting what has to happen only once, then you either need to put some logic between or use a different solution.
(For example generating some kind of time based ID which identifies the cycle of the cron job. So messages from the same cycle are having the same id, easy to filter them/ ignore everything after the firs.

AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances

I am newbie to AWS and want to use Simple workflow service.
So far I know is
URL: http://docs.aws.amazon.com/amazonswf/latest/awsflowguide/awsflow-basics-application-structure.html
1) Workflow starter, workflow worker (decider) and activities worker
can run on same EC2 instance or each one can run on different EC2 instance.
2) Activities worker execute activity tasks or activity methods
My question is:
1) Can I run Workflow starter, workflow worker (decider), Activities worker on one EC2 instance
and Activity tasks or activity methods on different EC2 instance?
Example:
EC2 instance 1 -> Workflow starter, workflow worker(decider), Activities worker
EC2 instance 2 -> Activity tasks or activity methods
If above thing is possible then can anyone point me to some example?
I have looked at AWS Helloworld distributed application but it
runs Workflow worker and Activities worker on different EC2 instance where activity task run on same machine as Activities worker
URL: http://docs.aws.amazon.com/amazonswf/latest/awsflowguide/getting-started-example-helloworldworkflowdistributed.html
Requirement
At beginning only one instance of EC2 would be running:
1) On this running EC2 instance, I will have Workflow starter and workflow worker (decider).
2) Based on task received in decision task list, decider will execute custom logic and based on custom
logic outcome I want to create an new instance of EC2 and execute activity worker on it.
Issue is I read somewhere that workflow worker (decider) and activity worker must be started before receiving a task in decision task list and in my case I cannot keep running 2 EC2 instances from the beginning because of cost reason.
Hence to solve this issue my solution was
1) Start decider and activity worker on the running instance of EC2.
2) Once the activity worker receives the task in activity task list, it will create a new instance of EC2 and execute the activity on it.
Thanks
yes. you can definitely do this. you can also have multiple workers (both workflow and activities) on multiple machines. you should not really care where the workflow and activities run as long as the work is getting done.
here is a sample I wrote to show how to use flow with java 1.8
https://github.com/mirceal/swf-flow-java18-sample
In App.java, comment line 42 (aw.start()) and build sample. After that comment line 43 (wfw.start()) and build sample. Copy the 2 built samples to 2 different machines and you should achieve what you are asking for.
line 51 starts the workflow.
Edit
Think of the workers as entities that have the ability to perform a task. The workers constantly look at a SWF (poll SWF) and if there is work they pick it up and execute it.
There is nothing preventing you from running workflow and activity workers on the same machine. There is also nothing preventing you from spinning up EC2 instances from an activity.
If you need to only run part of activities on a machine and other activities on other machines (the ones you spin up) you would typically use different SWF task lists.