Waiting for K8S Job to finish [duplicate] - amazon-web-services

This question already has answers here:
Tell when Job is Complete
(7 answers)
Closed 3 years ago.
I'm looking for a way to wait for Job to finish execution Successfully once deployed.
Job is being deployed from Azure DevOps though CD on K8S on AWS. It is running one time incremental database migrations using Fluent migrations each time it's deployed. I need to read pod.status.phase field.
If field is "Succeeded", then CD will continue. If it's "Failed", CD stops.
Anyone have an idea how to achieve this?

I think the best approach is to use the kubectl wait command:
Wait for a specific condition on one or many resources.
The command takes multiple resources and waits until the specified
condition is seen in the Status field of every given resource.
It will only return when the Job is completed (or the timeout is reached):
kubectl wait --for=condition=complete job/myjob --timeout=60s
If you don't set a --timeout, the default wait is 30 seconds.
Note: kubectl wait was introduced on Kubernetes v1.11.0. If you are using older versions, you can create some logic using kubectl get with --field-selector:
kubectl get pod --field-selector=status.phase=Succeeded

We can check Pod status using K8S Rest API.
In order to connect to API, we need to get a token:
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#without-kubectl-proxy
# Check all possible clusters, as you .KUBECONFIG may have multiple contexts:
kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'
# Select name of cluster you want to interact with from above output:
export CLUSTER_NAME="some_server_name"
# Point to the API server refering the cluster name
APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(#.name==\"$CLUSTER_NAME\")].cluster.server}")
# Gets the token value
TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(#.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 -d)
From above code we have acquired TOKEN and APISERVER address.
On Azure DevOps, on your target Release, on Agent Job, we can add Bash task:
#name of K8S Job object we are waiting to finish
JOB_NAME=name-of-db-job
APISERVER=set-api-server-from-previous-code
TOKEN=set-token-from-previous-code
#log APISERVER and JOB_NAME for troubleshooting
echo API Server: $APISERVER
echo JOB NAME: $JOB_NAME
#keep calling API until you get status Succeeded or Failed.
while true; do
#read all pods and query for pod containing JOB_NAME using jq.
#note that you should not have similar pod names with job name otherwise you will get mutiple results. This script is not expecting multiple results.
res=$(curl -X GET $APISERVER/api/v1/namespaces/default/pods/ --header "Authorization: Bearer $TOKEN" --insecure | jq --arg JOB_NAME "$JOB_NAME" '.items[] | select(.metadata.name | contains($JOB_NAME))' | jq '.status.phase')
if (res=="Succeeded"); then
echo Succeeded
exit 0
elif (res=="Failed"); then
echo Failed
exit 1
else
echo $res
fi
sleep 2
done
If Failed, script will exit with code 1 and CD will stop (if configured that way).
If Succeeded, exist with code 0 and CD will continue.
In final setup:
- Script is part of artifact and I'm using it inside Bash task in Agent Job.
- I have placed JOB_NAME into Task Env. Vars so it can be used for multiple DB migrations.
- Token and API Server address are in Variable group on global level.
TODO:
curl is not existing with code 0 if URL is invalid. It needs --fail flag, but still above line exists 0.
"Unknown" Pod status should be handled as well

Related

Checking for the result of the AWS CLI 'run-task' command, task stopped succesfully or from an error?

I'm currently moving an application off of static EC2 servers to ECS, as until now the release process has been ssh'ing into the server to git pull/migrate the database.
I've created everything I need using terraform to deploy my code from my organisations' Elastic Container Registry. I have a cluster, some services and task definitions.
I can deploy the app successfully for any given version now, however my main problem is finding a way to run migrations.
My approach so far has been to split the application into 3 services, I have my 'web' service which handles all HTTP traffic (serving the frontend, responding to API requests), my 'cron' service which handles things like sending emails/push notifications on specific times/events and my 'migrate' service which is just the 'cron' service but with the entryPoint to the container overwritten to just run the migrations (as I don't need any of the apache2 stuff for this container, and I didn't see reason to make another one for just migrations).
The problem I had with this was the 'migrate' service would constantly try and schedule more tasks for migrating the database, even though it only needed to be done once. So I've scrapped it as a service and kept it as a task definition however, so that I can still place it into my cluster.
As part of the deploy process I'm writing, I run that task inside the cluster via a bash script so I can wait until the migrations finish before deciding whether to take the application out of maintenance mode (if the migrations fail) or to deploy the new 'web'/'cron' containers once the migration has been completed.
Currently this is inside a shell script (ran by Github actions) that looks like this:
#!/usr/bin/env bash
CLUSTER_NAME=$1
echo $CLUSTER_NAME
OUTPUT=`aws ecs run-task --cluster ${CLUSTER_NAME} --task-definition saas-app-migrate`
if [$? -n 0]; then
>&2 echo $OUTPUT
exit 1
fi
TASKS=`echo $OUTPUT | jq '.tasks[].taskArn' | jq #sh | sed -e "s/'//g" | sed -e 's/"//g'`
for task in $TASKS
do
# check for task to be done
done
Because $TASKS contains the taskArn of any tasks that have been spawned by this, I am freely able to query the task however I don't know what information I'm looking for.
The AWS documentation says I should use the 'describe-task' command to then find out why a task has reached the 'STOPPED' status, as it provides a 'stopCode' and 'stoppedReason' property in the response. However, it doesn't say what these values would be if it was succesfully stopped? I don't want to have to introduce a manual step in my deployment where I wait until the migrations are done - with the application not being usable - to then tell my release process to continue.
Is there a link to documentation I might have missed with the values I'm searching for, or an alternate way to handle this case?

How to get the status of multiple services with kubectl?

As per my understanding of the docs, the -R flag should do exactly this, but for me the command kubectl rollout status -R -f k8s/services fails with error: rollout status is only supported on individual resources and resource collections - 3 resources were found.
In the k8s/services directory I have 3 service manifests. What is a resource collection, mentioned in the error message, if not 3 services for example? What should be in the directory when using -R?
kubectl rollout status --help:
Show the status of the rollout.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for
the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout
status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled
over by another revision, use --revision=N where N is the revision you need to watch for.
Examples:
# Watch the rollout status of a deployment
kubectl rollout status deployment/nginx
Options:
-f, --filename=[]: Filename, directory, or URL to files identifying the resource to get from a server.
-k, --kustomize='': Process the kustomization directory. This flag can't be used together with -f or -R.
-R, --recursive=false: Process the directory used in -f, --filename recursively. Useful when you want to manage
related manifests organized within the same directory.
--revision=0: Pin to a specific revision for showing its status. Defaults to 0 (last revision).
--timeout=0s: The length of time to wait before ending watch, zero means never. Any other values should contain a
corresponding time unit (e.g. 1s, 2m, 3h).
-w, --watch=true: Watch the status of the rollout until it's done.
Usage:
kubectl rollout status (TYPE NAME | TYPE/NAME) [flags] [options]
Use "kubectl options" for a list of global command-line options (applies to all commands).
I have tested with kubectl version 1.14 and 1.15.
It means that it found 3 services, but you can only see roll out status for a specific service. like :
kubectl rollout status -f k8s/services/<svc-name>.yaml
You don't need to use -R when all yamls are the child of services.
Take a look why -R flag was added in this issue

Cannot assign instance name to concurrent workflow in Informatica

In Informatica, I can start a workflow but cannot get it to recognize my instance name in the session log and Workflow Monitor.
The workflow starts but in the session log it displays this:
Workflow wf_Tenter image description hereemp started with run id [22350], run instance name [], run type [Concurrent Run with Un[enter image description here][1]ique Instance Name]
Instance name is blank.
My command is:
pmcmd startworkflow -sv <service> -d <domain> -u <user> -p <password> -f <folder> -rin INST1 -paramfile <full param file path name> wf_Temp
I have edited the workflow and selected the checkbox Configure Current Execution. Inside Configure Concurrent Execution button, I have created three instances: INST1, INST2, INST3, but without any associated parameter files. All parameter files are blank.
I understand, I think, that in order to start a workflow with PMCMD I must pass in one of the configured instance names (i.e. INST1, INST2, INST3, etc.)
If I execute the PMCMD command from Putty a second time to see the second instance run, I receive a message that workflow is still running and I have to wait? Why? I have checked the Concurrent Workflow box in the workflow.
ERROR: Workflow [wf_Temp]: Could not start execution of this workflow because the current run on this Integration Service has not completed yet.
Disconnecting from Integration Service
So, I think I'm close, but am missing something. The workflow runs with the parameter file I pass in PMCMD but the instance name seems to be ignored.
Further. Do, I have to pre-configure instance names in the Workflow manager? Is the PMCMD instance and parameter file parameters enough? It doesn't seem quite so dynamic if Instances have to be pre-defined in the workflows.
Thanks.
#MacieJG
Here's the screenshots from Putty when I run the command. You can see the instance name DALLAS is being passed through the PMCMD OK. No combination ever gets the Instance name. I did not include the pics of your suggested Test 1, but results were same.. still no instance.
Here's my complete test as requested in a comment above. I tried my best to put everything you may need here, but if I missed anything, just let me know. So here goes...
I've created a very simple workflow to run with instance name. It uses a timer to wait and a command tast to write the instance name to a file:
The concurrent execution has been set up in the most simple way:
Now, I've prepared the followig batch to run the workflow (just user & password removed):
SET "PMCMD=C:\Informatica\9.5.1\clients\PowerCenterClient\CommandLineUtilities\PC\server\bin\pmcmd"
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin GLASGOW wf_Instance_Test
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin FRANKFURT wf_Instance_Test
%PMCMD% startworkflow -sv Dev_IS -d Domain_vic-vpc -u ####### -p ####### -f Dev01 -rin GLASGOW wf_Instance_Test
It runs three instances, two of them with same name, just to test it. I run the batch the following way to capture the output:
pmStartTestWF.bat > c:\MG\pmStartTestWF.log
Once I execute it, here what I see in workflow monitor:
Just as expected, three instances executed and properly displayed. File output looks fine as well:
The output of pmcmd can be found here. Full definition of my test workflow is available here.
I really hope this will help you somehow. Feel free to let me know if you'd find anything missing here. Good luck!
You don't need to pre-configure instance names in workflow. Passing the instance name in pmcmd along with parameter filename is enough.
try this: pmcmd startworkflow -sv (service) -d (domain) -u (user) -p (password) -f (folder) -paramfile (full param file path name) -rin INST1 wf_Temp
To be precise: when you configure Concurrent Execution, you can specify if you:
allow concurrent run with same instance name
allow concurrent run only with unique instance name
In addition to that you may, but don't have to, indicate which instance should use which parameter file, so it won't be need to mention it while executing. But that's a separate feature.
Now, if you've chosen the first one, you will be able to invoke the WF multiple times with the very same command. If you've chosen the second one and try this, you will get the 'WF is already running' error.
The trouble is that your example seems correct at first glance. As per the log message:
Workflow wf_Temp started with run id [22350], run instance name [], run type [Concurrent Run with Unique Instance Name]
So you're allowing unique instances only. It seems that the instance name has not been used. First execution does not set the instance name, so similar second execution won't use it either and will get rejected as this is the same instance name (i.e. None).
You may try to change the setting to Allow concurrent run with same instance name, this shall allow the secon execution, but does not solve the main issue. For some reason the instance name does not get passed.
Please verify your command against the docs referenced below. Try to match the order perhaps. Please share some more info if it still fails.
Looking at the docs:
pmcmd StartWorkflow
<<-service|-sv> service [<-domain|-d> domain] [<-timeout|-t> timeout]>
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<<-usersecuritydomain|-usd> usersecuritydomain|<-usersecuritydomainvar|-usdv>
userSecuritydomainEnvVar>]
[<-folder|-f> folder]
[<-startfrom> taskInstancePath]
[<-recovery|-norecovery>]
[<-paramfile> paramfile]
[<-localparamfile|-lpf> localparamfile]
[<-osprofile|-o> OSUser]
[-wait|-nowait]
[<-runinsname|-rin> runInsName]
workflow

Running updates on EC2s that roll back on failure of status check

I’m setting up a patch process for EC2 servers running a web application.
I need to build an automated process that installs system updates but, reverts back to the last working ec2 instance if the web application fails a status check.
I’ve been trying to do this using an Automation Document in EC2 Systems Manager that performs the following steps:
Stop EC2 instance
Create AMI from instance
Launch new instance from newly created AMI
Run updates
Run status check on web application
If check fails, stop new instance and restart original instance
The Automation Document runs the first 5 steps successfully, but I can't identify how to trigger step 6? Can I do this within the Automation Document? What output would I be able to call from step 5? If it uses aws:runCommand, should the runCommand trigger a new automation document or another AWS tool?
I tried the following to solve this, which more or less worked:
Included an aws:runCommand action in the automation document
This ran the DocumentName "AWS-RunShellScript" with the following parameters:
Downloaded the script from s3:
sudo aws s3 cp s3://path/to/s3/script.sh /tmp/script.sh
Set the file to executable:
chmod +x /tmp/script.sh
Executed the script using variables set in, or generated by the automation document
bash /tmp/script.sh -o {{VAR1}} -n {{VAR2}} -i {{VAR3}} -l {{VAR4}} -w {{VAR5}}
The script included the following getopts command to set the inputted variables:
while getopts o:n:i:l:w: option
do
case "${option}"
in
n) VAR1=${OPTARG};;
o) VAR2=${OPTARG};;
i) VAR3=${OPTARG};;
l) VAR4=${OPTARG};;
w) VAR5=${OPTARG};;
esac
done
The bash script used the variables to run the status check, and roll back to last working instance if it failed.

AWS Elastic Beanstalk, running a cronjob

I would like to know if there is a way to setup a cronjob/task to execute every minute. Currently any of my instances should be able to run this task.
This is what I have tried to do in the config files without success:
container_commands:
01cronjobs:
command: echo "*/1 * * * * root php /etc/httpd/myscript.php"
I'm not really sure if this is the correct way to do it
Any ideas?
This is how I added a cron job to Elastic Beanstalk:
Create a folder at the root of your application called .ebextensions if it doesn't exist already. Then create a config file inside the .ebextensions folder. I'll use example.config for illustration purposes. Then add this to example.config
container_commands:
01_some_cron_job:
command: "cat .ebextensions/some_cron_job.txt > /etc/cron.d/some_cron_job && chmod 644 /etc/cron.d/some_cron_job"
leader_only: true
This is a YAML configuration file for Elastic Beanstalk. Make sure when you copy this into your text editor that your text editor uses spaces instead of tabs. Otherwise you'll get a YAML error when you push this to EB.
So what this does is create a command called 01_some_cron_job. Commands are run in alphabetical order so the 01 makes sure it's run as the first command.
The command then takes the contents of a file called some_cron_job.txt and adds it to a file called some_cron_job in /etc/cron.d.
The command then changes the permissions on the /etc/cron.d/some_cron_job file.
The leader_only key ensures the command is only run on the ec2 instance that is considered the leader. Rather than running on every ec2 instance you may have running.
Then create a file called some_cron_job.txt inside the .ebextensions folder. You will place your cron jobs in this file.
So for example:
# The newline at the end of this file is extremely important. Cron won't run without it.
* * * * * root /usr/bin/php some-php-script-here > /dev/null
So this cron job will run every minute of every hour of every day as the root user and discard the output to /dev/null. /usr/bin/php is the path to php. Then replace some-php-script-here with the path to your php file. This is obviously assuming your cron job needs to run a PHP file.
Also, make sure the some_cron_job.txt file has a newline at the end of the file just like the comment says. Otherwise cron won't run.
Update:
There is an issue with this solution when Elastic Beanstalk scales up your instances. For example, lets say you have one instance with the cron job running. You get an increase in traffic so Elastic Beanstalk scales you up to two instances. The leader_only will ensure you only have one cron job running between the two instances. Your traffic decreases and Elastic Beanstalk scales you down to one instance. But instead of terminating the second instance, Elastic Beanstalk terminates the first instance that was the leader. You now don't have any cron jobs running since they were only running on the first instance that was terminated. See the comments below.
Update 2:
Just making this clear from the comments below:
AWS has now protection against automatic instance termination. Just enable it on your leader instance and you're good to go. – Nicolás Arévalo Oct 28 '16 at 9:23
This is the official way to do it now (2015+). Please try this first, it's by far easiest method currently available and most reliable as well.
According to current docs, one is able to run periodic tasks on their so-called worker tier.
Citing the documentation:
AWS Elastic Beanstalk supports periodic tasks for worker environment tiers in environments running a predefined configuration with a solution stack that contains "v1.2.0" in the container name. You must create a new environment.
Also interesting is the part about cron.yaml:
To invoke periodic tasks, your application source bundle must include a cron.yaml file at the root level. The file must contain information about the periodic tasks you want to schedule. Specify this information using standard crontab syntax.
Update: We were able to get this work. Here are some important gotchas from our experience (Node.js platform):
When using cron.yaml file, make sure you have latest awsebcli, because older versions will not work properly.
It is also vital to create new environment (at least in our case it was), not just clone old one.
If you want to make sure CRON is supported on your EC2 Worker Tier instance, ssh into it (eb ssh), and run cat /var/log/aws-sqsd/default.log. It should report as aws-sqsd 2.0 (2015-02-18). If you don't have 2.0 version, something gone wrong when creating your environment and you need to create new one as stated above.
Regarding jamieb's response, and as alrdinleal mentions, you can use the 'leader_only' property to ensure that only one EC2 instance runs the cron job.
Quote taken from http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html:
you can use leader_only. One instance is chosen to be the leader in an Auto Scaling group. If the leader_only value is set to true, the command runs only on the instance that is marked as the leader.
Im trying to achieve a similar thing on my eb, so will update my post if I solve it.
UPDATE:
Ok, I now have working cronjobs using the following eb config:
files:
"/tmp/cronjob" :
mode: "000777"
owner: ec2-user
group: ec2-user
content: |
# clear expired baskets
*/10 * * * * /usr/bin/wget -o /dev/null http://blah.elasticbeanstalk.com/basket/purge > $HOME/basket_purge.log 2>&1
# clean up files created by above cronjob
30 23 * * * rm $HOME/purge*
encoding: plain
container_commands:
purge_basket:
command: crontab /tmp/cronjob
leader_only: true
commands:
delete_cronjob_file:
command: rm /tmp/cronjob
Essentially, I create a temp file with the cronjobs and then set the crontab to read from the temp file, then delete the temp file afterwards. Hope this helps.
I spoke to an AWS support agent and this is how we got this to work for me. 2015 solution:
Create a file in your .ebextensions directory with your_file_name.config.
In the config file input:
files:
"/etc/cron.d/cron_example":
mode: "000644"
owner: root
group: root
content: |
* * * * * root /usr/local/bin/cron_example.sh
"/usr/local/bin/cron_example.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
/usr/local/bin/test_cron.sh || exit
echo "Cron running at " `date` >> /tmp/cron_example.log
# Now do tasks that should only run on 1 instance ...
"/usr/local/bin/test_cron.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
METADATA=/opt/aws/bin/ec2-metadata
INSTANCE_ID=`$METADATA -i | awk '{print $2}'`
REGION=`$METADATA -z | awk '{print substr($2, 0, length($2)-1)}'`
# Find our Auto Scaling Group name.
ASG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" \
--region $REGION --output text | awk '/aws:autoscaling:groupName/ {print $5}'`
# Find the first instance in the Group
FIRST=`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $ASG \
--region $REGION --output text | awk '/InService$/ {print $4}' | sort | head -1`
# Test if they're the same.
[ "$FIRST" = "$INSTANCE_ID" ]
commands:
rm_old_cron:
command: "rm *.bak"
cwd: "/etc/cron.d"
ignoreErrors: true
This solution has 2 drawbacks:
On subsequent deployments, Beanstalk renames the existing cron script as .bak, but cron will still run it. Your Cron now executes twice on the same machine.
If your environment scales up, you get several instances, all running your cron script. This means your mail shots are repeated, or your database archives duplicated
Workaround:
Ensure any .ebextensions script which creates a cron also removes the .bak files on subsequent deployments.
Have a helper script which does the following: -- Gets the current Instance ID from the Metadata -- Gets the current Auto
Scaling Group name from the EC2 Tags -- Gets the list of EC2
Instances in that Group, sorted alphabetically. -- Takes the first
instance from that list. -- Compares the Instance ID from step 1
with the first Instance ID from step 4.
Your cron scripts can then use this helper script to determine if they should execute.
Caveat:
The IAM Role used for the Beanstalk instances needs ec2:DescribeTags and autoscaling:DescribeAutoScalingGroups permissions
The instances chosen from are those shown as InService by Auto Scaling. This does not necessarily mean they are fully booted up and ready to run your cron.
You would not have to set the IAM Roles if you are using the default beanstalk role.
As mentioned above, the fundamental flaw with establishing any crontab configuration is that it only happens at deployment. As the cluster gets auto-scaled up, and then back down, it is favored to also be the first server turned off. In addition there would be no fail-over, which for me was critical.
I did some research, then talked with our AWS account specialist to bounce ideas and valid the solution I came up with. You can accomplish this with OpsWorks, although it's bit like using a house to kill a fly. It is also possible to use Data Pipeline with Task Runner, but this has limited ability in the scripts that it can execute, and I needed to be able to run PHP scripts, with access to the whole code base. You could also dedicate an EC2 instance outside of the ElasticBeanstalk cluster, but then you have no fail-over again.
So here is what I came up with, which apparently is unconventional (as the AWS rep commented) and may be considered a hack, but it works and is solid with fail-over. I chose a coding solution using the SDK, which I'll show in PHP, although you could do the same method in any language you prefer.
// contains the values for variables used (key, secret, env)
require_once('cron_config.inc');
// Load the AWS PHP SDK to connection to ElasticBeanstalk
use Aws\ElasticBeanstalk\ElasticBeanstalkClient;
$client = ElasticBeanstalkClient::factory(array(
'key' => AWS_KEY,
'secret' => AWS_SECRET,
'profile' => 'your_profile',
'region' => 'us-east-1'
));
$result = $client->describeEnvironmentResources(array(
'EnvironmentName' => AWS_ENV
));
if (php_uname('n') != $result['EnvironmentResources']['Instances'][0]['Id']) {
die("Not the primary EC2 instance\n");
}
So walking through this and how it operates... You call scripts from crontab as you normally would on every EC2 instance. Each script includes this at the beginning (or includes a single file for each, as I use it), which establishes an ElasticBeanstalk object and retrieves a list of all instances. It uses only the first server in the list, and checks if it matches itself, which if it does it continues, otherwise it dies and closes out. I've checked and the list returned seems to be consistent, which technically it only needs to be consistent for a minute or so, as each instance executes the scheduled cron. If it does change, it wouldn't matter, since again it only is relevant for that small window.
This isn't elegant by any means, but suited our specific needs - which was not to increase cost with an additional service or have to have a dedicated EC2 instance, and would have fail-over in case of any failure. Our cron scripts run maintenance scripts which get placed into SQS and each server in the cluster helps execute. At least this may give you an alternate option if it fits your needs.
-Davey
If you're using Rails, you can use the whenever-elasticbeanstalk gem. It allows you to run cron jobs on either all instances or just one. It checks every minute to ensure that there is only one "leader" instance, and will automatically promote one server to "leader" if there are none. This is needed since Elastic Beanstalk only has the concept of leader during deployment and may shut down any instance at any time while scaling.
UPDATE
I switched to using AWS OpsWorks and am no longer maintaining this gem. If you need more functionality than is available in the basics of Elastic Beanstalk, I highly recommend switching to OpsWorks.
You really don't want to be running cron jobs on Elastic Beanstalk. Since you'll have multiple application instances, this can cause race conditions and other odd problems. I actually recently blogged about this (4th or 5th tip down the page). The short version: Depending on the application, use a job queue like SQS or a third-party solution like iron.io.
2017: If you are using Laravel5+
You just need 2 minutes to configure it:
create a Worker Tier
install laravel-aws-worker
composer require dusterio/laravel-aws-worker
add a cron.yaml to the root folder:
Add cron.yaml to the root folder of your application (this can be a
part of your repo or you could add this file right before deploying to
EB - the important thing is that this file is present at the time of
deployment):
version: 1
cron:
- name: "schedule"
url: "/worker/schedule"
schedule: "* * * * *"
That's it!
All your task in App\Console\Kernel will now be executed
Detailed instructions and explainations: https://github.com/dusterio/laravel-aws-worker
How to write tasks inside of Laravel: https://laravel.com/docs/5.4/scheduling
A more readable solution using files instead of container_commands:
files:
"/etc/cron.d/my_cron":
mode: "000644"
owner: root
group: root
content: |
# override default email address
MAILTO="example#gmail.com"
# run a Symfony command every five minutes (as ec2-user)
*/10 * * * * ec2-user /usr/bin/php /var/app/current/app/console do:something
encoding: plain
commands:
# delete backup file created by Elastic Beanstalk
clear_cron_backup:
command: rm -f /etc/cron.d/watson.bak
Note the format differs from the usual crontab format in that it specifies the user to run the command as.
My 1 cent of contribution for 2018
Here is the right way to do it (using django/python and django_crontab app):
inside .ebextensions folder create a file like this 98_cron.config:
files:
"/tmp/98_create_cron.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
cd /
sudo /opt/python/run/venv/bin/python /opt/python/current/app/manage.py crontab remove > /home/ec2-user/remove11.txt
sudo /opt/python/run/venv/bin/python /opt/python/current/app/manage.py crontab add > /home/ec2-user/add11.txt
container_commands:
98crontab:
command: "mv /tmp/98_create_cron.sh /opt/elasticbeanstalk/hooks/appdeploy/post && chmod 774 /opt/elasticbeanstalk/hooks/appdeploy/post/98_create_cron.sh"
leader_only: true
It needs to be container_commands instead of commands
The latest example from Amazon is the easiest and most efficient (periodic tasks):
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
where you create a separate worker tier to execute any of your cron jobs. Create the cron.yaml file and place it in your root folder. One issue I had (after cron did not seem to be executing) was finding that my CodePipeline did not have authority to perform a dynamodb modification. Based on that after adding FullDynamoDB access under IAM -> roles -> yourpipeline and redeploying (elastic beanstalk) it worked perfectly.
Someone was wondering about the leader_only auto scaling problems when new leaders arise. I can't seem to figure out how to reply to their comments, but see this link: http://blog.paulopoiati.com/2013/08/25/running-cron-in-elastic-beanstalk-auto-scaling-environment/
So we've been struggling with this for a while and after some discussion with an AWS rep I've finally come up with what I think is the best solution.
Using a worker tier with cron.yaml is definitely the easiest fix. However, what the documentation doesn't make clear is that this will put the job at the end of the SQS queue you're using to actually run your jobs. If your cron jobs are time sensitive (as many are), this isn't acceptable, since it would depend on the size of the queue. One option is to use a completely separate environment just to run cron jobs, but I think that's overkill.
Some of the other options, like checking to see if you're the first instance in the list, aren't ideal either. What if the current first instance is in the process of shutting down?
Instance protection can also come with issues - what if that instance gets locked up / frozen?
What's important to understand is how AWS itself manages the cron.yaml functionality. There is an SQS daemon which uses a Dynamo table to handle "leader election". It writes to this table frequently, and if the current leader hasn't written in a short while, the next instance will take over as leader. This is how the daemon decides which instance to fire the job into the SQS queue.
We can repurpose the existing functionality rather than trying to rewrite our own. You can see the full solution here: https://gist.github.com/dorner/4517fe2b8c79ccb3971084ec28267f27
That's in Ruby, but you can easily adapt it to any other language that has the AWS SDK. Essentially, it checks the current leader, then checks the state to make sure it's in a good state. It'll loop until there is a current leader in a good state, and if the current instance is the leader, execute the job.
The best way to do this is to use an Elastic Beanstalk Worker Environment (see "Option 1" below). However, this will add to your server costs. If you don't want to do this, see "Option 2" below for how to configure cron itself.
Option 1: Use Elastic Beanstalk Worker environments
Amazon has support for Elastic Beanstalk Worker Environments. They are Elastic Beanstalk managed environments that come with an SQS queue which you can enqueue tasks onto. You can also give them a cron config that will automatically enqueue the task on a recurring schedule. Then, rather than receiving requests from a load balancer, the servers in a worker environment each have a daemon (managed by Elastic Beanstalk) that polls the queue for tasks and calls the appropriate web endpoint when they get a message on the queue. Worker environments have several benefits over running cron yourself:
Performance. Your tasks are now running on dedicated servers instead of competing for CPU and memory with web requests. You can also have different specs for the worker servers (ex. you can have more memory on just the worker servers).
Scalability. You can also scale up your number of worker servers to more than 1 in order to handle large task loads.
Ad-hoc Tasks. Your code can enqueue ad-hoc tasks as well as scheduled ones.
Standardization. You write tasks as web endpoints rather than needing to configure your own task framework, which lets your standardize your code and tooling.
If you just want a cron replacement, all you need to do is make a file called cron.yaml at the top level of your project, with config like the following:
cron.yaml
version: 1
cron:
- name: "hourly"
url: "/tasks/hourly"
schedule: "0 */1 * * *"
This will call the url /tasks/hourly once an hour.
If you are deploying the same codebase to web and worker environments, you should have the task URLs require an environment variable that you set on worker environments and not web environments. This way, your task endpoints are not exposed to the world (task servers by default do not accept incoming HTTP requests, as the only thing making calls to them is the on-server daemon).
The full docs are here: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
Option 2: Configure Cron
If you want to run cron, you need to make sure it's running on only one server. The leader_only flag in .ebextensions config isn't sufficient because servers don't reliably stay the leader. This can be fixed by deleting the cron config if present on any server as the first step of a deploy and then installing it on just one server using leader_only. Here is an example .ebextensions config file that accomplishes this:
.ebextensions/cron.config
container_commands:
01_remove_cron_jobs:
command: "rm /etc/cron.d/cronjobs || exit 0"
02_set_up_cron:
command: "cat .ebextensions/cronjobs.txt > /etc/cron.d/cronjobs && chmod 644 /etc/cron.d/cronjobs"
leader_only: true
This config file assumes the existence of a file .ebextensions/cronjobs.txt. This file contains your actual cron config. Note that in order to have environment variables loaded and your code in scope, you need to have code that does this baked into each command. The following is an example cron config that works on an Amazon Linux 2 based Python environment:
.ebextensions/cronjobs.txt
SHELL=/bin/bash
PROJECT_PATH=/var/app/current
ENV_PATH=/opt/elasticbeanstalk/deployment/env
# m h dom mon dow user command
0 * * * * ec2-user set -a; source <(sudo cat $ENV_PATH) && cd $PROJECT_PATH && python HOURLY_COMMAND > /dev/null
# Cron requires a newline at the end of the file
Here is a full explanation of the solution:
http://blog.paulopoiati.com/2013/08/25/running-cron-in-elastic-beanstalk-auto-scaling-environment/
To control whether Auto Scaling can terminate a particular instance when scaling in, use instance protection. You can enable the instance protection setting on an Auto Scaling group or an individual Auto Scaling instance. When Auto Scaling launches an instance, the instance inherits the instance protection setting of the Auto Scaling group. You can change the instance protection setting for an Auto Scaling group or an Auto Scaling instance at any time.
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection
I had another solution to this if a php file needs to be run through cron and if you had set any NAT instances then you can put cronjob on NAT instance and run php file through wget.
here is a fix incase you want to do this in PHP. You just need cronjob.config in your .ebextensions folder to get it to work like this.
files:
"/etc/cron.d/my_cron":
mode: "000644"
owner: root
group: root
content: |
empty stuff
encoding: plain
commands:
01_clear_cron_backup:
command: "rm -f /etc/cron.d/*.bak"
02_remove_content:
command: "sudo sed -i 's/empty stuff//g' /etc/cron.d/my_cron"
container_commands:
adding_cron:
command: "echo '* * * * * ec2-user . /opt/elasticbeanstalk/support/envvars && /usr/bin/php /var/app/current/index.php cron sendemail > /tmp/sendemail.log 2>&1' > /etc/cron.d/my_cron"
leader_only: true
the envvars gets the environment variables for the files. You can debug the output on the tmp/sendemail.log as above.
Hope this helps someone as it surely helped us!
Based on the principles of the answer from user1599237, where you let the cron jobs run on all instances but then instead in the beginning of the jobs determine if they should be allowed to run, I have made another solution.
Instead of looking at the running instances (and having to store your AWS key and secret) I'm using the MySQL database that I'm already connecting to from all instances.
It has no downsides, only positives:
no extra instance or expenses
rock solid solution - no chance of double execution
scalable - automatically works as your instances are scaled up and down
failover - automatically works in case an instance has a failure
Alternatively, you could also use a commonly shared filesystem (like AWS EFS via the NFS protocol) instead of a database.
The following solution is created within the PHP framework Yii but you can easily adapt it for another framework and language. Also the exception handler Yii::$app->system is a module of my own. Replace it with whatever you are using.
/**
* Obtain an exclusive lock to ensure only one instance or worker executes a job
*
* Examples:
*
* `php /var/app/current/yii process/lock 60 empty-trash php /var/app/current/yii maintenance/empty-trash`
* `php /var/app/current/yii process/lock 60 empty-trash php /var/app/current/yii maintenance/empty-trash StdOUT./test.log`
* `php /var/app/current/yii process/lock 60 "empty trash" php /var/app/current/yii maintenance/empty-trash StdOUT./test.log StdERR.ditto`
* `php /var/app/current/yii process/lock 60 "empty trash" php /var/app/current/yii maintenance/empty-trash StdOUT./output.log StdERR./error.log`
*
* Arguments are understood as follows:
* - First: Duration of the lock in minutes
* - Second: Job name (surround with quotes if it contains spaces)
* - The rest: Command to execute. Instead of writing `>` and `2>` for redirecting output you need to write `StdOUT` and `StdERR` respectively. To redirect stderr to stdout write `StdERR.ditto`.
*
* Command will be executed in the background. If determined that it should not be executed the script will terminate silently.
*/
public function actionLock() {
$argsAll = $args = func_get_args();
if (!is_numeric($args[0])) {
\Yii::$app->system->error('Duration for obtaining process lock is not numeric.', ['Args' => $argsAll]);
}
if (!$args[1]) {
\Yii::$app->system->error('Job name for obtaining process lock is missing.', ['Args' => $argsAll]);
}
$durationMins = $args[0];
$jobName = $args[1];
$instanceID = null;
unset($args[0], $args[1]);
$command = trim(implode(' ', $args));
if (!$command) {
\Yii::$app->system->error('Command to execute after obtaining process lock is missing.', ['Args' => $argsAll]);
}
// If using AWS Elastic Beanstalk retrieve the instance ID
if (file_exists('/etc/elasticbeanstalk/.aws-eb-system-initialized')) {
if ($awsEb = file_get_contents('/etc/elasticbeanstalk/.aws-eb-system-initialized')) {
$awsEb = json_decode($awsEb);
if (is_object($awsEb) && $awsEb->instance_id) {
$instanceID = $awsEb->instance_id;
}
}
}
// Obtain lock
$updateColumns = false; //do nothing if record already exists
$affectedRows = \Yii::$app->db->createCommand()->upsert('system_job_locks', [
'job_name' => $jobName,
'locked' => gmdate('Y-m-d H:i:s'),
'duration' => $durationMins,
'source' => $instanceID,
], $updateColumns)->execute();
// The SQL generated: INSERT INTO system_job_locks (job_name, locked, duration, source) VALUES ('some-name', '2019-04-22 17:24:39', 60, 'i-HmkDAZ9S5G5G') ON DUPLICATE KEY UPDATE job_name = job_name
if ($affectedRows == 0) {
// record already exists, check if lock has expired
$affectedRows = \Yii::$app->db->createCommand()->update('system_job_locks', [
'locked' => gmdate('Y-m-d H:i:s'),
'duration' => $durationMins,
'source' => $instanceID,
],
'job_name = :jobName AND DATE_ADD(locked, INTERVAL duration MINUTE) < NOW()', ['jobName' => $jobName]
)->execute();
// The SQL generated: UPDATE system_job_locks SET locked = '2019-04-22 17:24:39', duration = 60, source = 'i-HmkDAZ9S5G5G' WHERE job_name = 'clean-trash' AND DATE_ADD(locked, INTERVAL duration MINUTE) < NOW()
if ($affectedRows == 0) {
// We could not obtain a lock (since another process already has it) so do not execute the command
exit;
}
}
// Handle redirection of stdout and stderr
$command = str_replace('StdOUT', '>', $command);
$command = str_replace('StdERR.ditto', '2>&1', $command);
$command = str_replace('StdERR', '2>', $command);
// Execute the command as a background process so we can exit the current process
$command .= ' &';
$output = []; $exitcode = null;
exec($command, $output, $exitcode);
exit($exitcode);
}
This is the database schema I'm using:
CREATE TABLE `system_job_locks` (
`job_name` VARCHAR(50) NOT NULL,
`locked` DATETIME NOT NULL COMMENT 'UTC',
`duration` SMALLINT(5) UNSIGNED NOT NULL COMMENT 'Minutes',
`source` VARCHAR(255) NULL DEFAULT NULL,
PRIMARY KEY (`job_name`)
)