Im using hashicorp's Packer to create machine images for the google cloud (AMI for Amazon). I want every instance to run a script once the instance is created on the cloud. As i understand from the Packer docs, i could use the startup_script_file to do this. Now i got this working but it seems that the script is only runned once, on image creation resulting in the same output on every running instance. How can i trigger this script only on instance creation such that i can have different output for every instance?
packer config:
{
"builders": [{
"type": "googlecompute",
"project_id": "project-id",
"source_image": "debian-9-stretch-v20200805",
"ssh_username": "name",
"zone": "europe-west4-a",
"account_file": "secret-account-file.json",
"startup_script_file": "link to file"
}]
}
script:
#!/bin/bash
echo $((1 + RANDOM % 100)) > test.log #output of this remains the same on every created instance.
Related
We are using Dataflow Flex Templates and following this guide (https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates) to stage and launch jobs. This is working in our environment. However, when I SSH onto the Dataflow VM and run docker ps I see it is referencing the a different docker image to the one we speccify in our template (underlined in green):
The template I am launching from is as follows and jobs are created using gcloud beta dataflow flex-template run:
{
"image": "gcr.io/<MY PROJECT ID>/samples/dataflow/streaming-beam-sql:latest",
"metadata": {
"description": "An Apache Beam streaming pipeline that reads JSON encoded messages from Pub/Sub, uses Beam SQL to transform the message data, and writes the results to a BigQuery",
"name": "Streaming Beam SQL",
"parameters": [
{
"helpText": "Pub/Sub subscription to read from.",
"label": "Pub/Sub input subscription.",
"name": "inputSubscription",
"regexes": [
".*"
]
},
{
"helpText": "BigQuery table spec to write to, in the form 'project:dataset.table'.",
"is_optional": true,
"label": "BigQuery output table",
"name": "outputTable",
"regexes": [
"[^:]+:[^.]+[.].+"
]
}
]
},
"sdkInfo": {
"language": "JAVA"
}
}
So I would expect the output of docker ps to show gcr.io/<MY PROJECT ID>/samples/dataflow/streaming-beam-sql as the image on Dataflow. When I launch the image from GCR to run on a GCE instance I get the following output when running docker ps:
Should I expect to see the name of the image I have referenced in the Dataflow template on the Dataflow VM? Or have I missed a step somewhere?
Thanks!
TLDR; You are looking in the worker VM instead of launcher VM.
In case of flex templates, when you run the job, it first creates a launcher VM where it pulls your container and runs it to generate the job graph. This VM will destroyed after this step is completed. Then the worker VM is started to actually run the generated job graph. In the worker VM there is no need for your container. Your container is used only to generate the job graph based on the parameters passed.
In your case, you are trying to search for your image in the worker VM. The launcher VM is short lived and starts with launcher-*********************. If you SSH into that VM and do docker ps you will be able to see your container image.
I am very new to AWS Step Functions and AWS Lambda Functions and could really use some help getting an EMR Cluster running through Step Functions. A sample of my current State Machine structure is shown by the following code
{
"Comment": "This is a test for running the structure of the CustomCreate job.",
"StartAt": "PreStep",
"States": {
"PreStep": {
"Comment": "Check that all the necessary files exist before running the job.",
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXX:function:CustomCreate-PreStep-Function",
"Next": "Run Job Choice"
},
"Run Job Choice": {
"Comment": "This step chooses whether or not to go forward with running the main job.",
"Type": "Choice",
"Choices": [
{
"Variable": "$.FoundNecessaryFiles",
"BooleanEquals": true,
"Next": "Spin Up Cluster"
},
{
"Variable": "$.FoundNecessaryFiles",
"BooleanEquals": false,
"Next": "Do Not Run Job"
}
]
},
"Do Not Run Job": {
"Comment": "This step triggers if the PreStep fails and the job should not run.",
"Type": "Fail",
"Cause": "PreStep unsuccessful"
},
"Spin Up Cluster": {
"Comment": "Spins up the EMR Cluster.",
"Type": "Pass",
"Next": "Update Env"
},
"Update Env": {
"Comment": "Update the environment variables in the EMR Cluster.",
"Type": "Pass",
"Next": "Run Job"
},
"Run Job": {
"Comment": "Add steps to the EMR Cluster.",
"Type": "Pass",
"End": true
}
}
}
Which is shown by the following workflow diagram
The PreStep and Run Job Choice tasks use a simple Lambda Function to check that the files necessary to run this job exist on my S3 Bucket, then go to spin up the cluster provided that the necessary files are found. These tasks are working properly.
What I am not sure about is how to handle the EMR Cluster related steps.
In my current structure, the first task is to spin up an EMR Cluster. this could be done through directly using the Step Function JSON, or preferably, using a JSON Cluster Config file (titled EMR-cluster-setup.json) I have located on my S3 Bucket.
My next task is to update the EMR Cluster environment variables. I have a .sh script located on my S3 Bucket that can do this. I also have a JSON file (titled EMR-RUN-Script.json) located on my S3 Bucket that will add a first step to the EMR Cluster that will run and source the .sh script. I just need to run that JSON file from within the EMR Cluster, which I do not know how to do using the Step Functions. The code for EMR-RUN-SCRIPT.json is shown below
[
{
"Name": "EMR-RUN-SCRIPT",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
"Args": [
"s3://PATH/TO/env_configs.sh"
]
}
}
]
My third task is to add a step that contains a spark-submit command to the EMR Cluster. This command is described in a JSON config file (titled EMR-RUN-STEP.json) located on my S3 Bucket that can be uploaded to the EMR Cluster in a similar manner to uploading the environment configs file in the previous step. The code for EMR-RUN-STEP.json is shown below
[
{
"Name": "EMR-RUN-STEP",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"bash", "-c",
"source /home/hadoop/.bashrc && spark-submit --master yarn --conf spark.yarn.submit.waitAppCompletion=false --class CLASSPATH.TO.MAIN s3://PATH/TO/JAR/FILE"
]
}
}
]
Finally, I want to have a task that makes sure the EMR Cluster terminates after it completes its run.
I know there may be a lot involved within this question, but I would greatly appreciate any assistance with any of the issues described above. Whether it be following the structure I outlined above, or if you know of another solution, I am open to any form of help. Thank you in advance.
You need a terminate cluster step,
as documentation states:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html
createCluster uses the same request syntax as runJobFlow, except for the following:
The field Instances.KeepJobFlowAliveWhenNoSteps is mandatory,
and must have the Boolean value TRUE.
So, you need a step to do this for you:
terminateCluster.sync - for me this is preferable over the simple terminateCluster as it waits for the cluster to actually terminate and you can handle any hangs here - you'll be using Standard step functions so the bit of extra time will not be billed
Shuts down a cluster (job flow).
terminateJobFlows The same as terminateCluster, but waits for the cluster to terminate.
ps.: if you are using termination protection you'll need an extra step to turn if off before you can terminate your cluster ;)
'KeepJobFlowAliveWhenNoSteps': False
add the above configurations to emr cluster creation script. it will auto terminate emr clusters when all the steps are completed emr boto3 config
I am trying to add userdate to run some custom script when I create a windows ec2 instance using cloudformation. (Windows 2016)
"UserData" : {
"Fn::Base64" : {
"Fn::Join" : [
"",
[
"<powershell> \n",
"C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 \n",
"C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\create_folder.ps1 \n",
"New-Item -Path c:\\test3 -ItemType directory",
"</powershell>"
]
]
}
},
the above script does not seems to be working.
Basically I need to run some custom script (which I already added in my base image) and some powershell command.
By default the UserData section is not executed in 2016 Windows AMI.
You have to do the following steps manually;
Log-in to the instance. Open powershell terminal.
Go to the directory C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts.
Run
InitializeInstance.ps -Schedule command.
After next start-up, the user-data section will be executed.
I believe now you encounter another issue that you can't manually login to the instance.
Then what you can do is create your own AMI by customizing Windows 2016 AMI with adding the following steps also.
Reference: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-windows-user-data.html
Composition
Jenkins server on EC2 instance, uses EFS
Docker image for above Jenkins server
Need
Write templates to directory on EFS each time ECS starts the task which builds the Jenkins server
Where is the appropriate place to put a step to do the write?
Tried
If I do it in the Dockerfile, it writes to the Docker image, but never propagates the changes to EFS so that the templates are available as projects on the Jenkins server.
I've tried putting the write command in jenkins.sh but I can't figure out how that is run, anyway it doesn't place the templates where I need them.
The original question included:
Write templates to directory on EFS each time ECS starts the task
In addition to #luke-peterson's answer you can use the shell script as an entry point in your docker file, in order to copy files between the mounted EFS folder and the container.
Instead of ENTRYPOINT, use following directive in your dockerfile:
CMD ["sh", "/app/startup.sh"]
And inside startup.sh you can copy files freely and run the app (.net core app in my example):
cp -R /app/wwwroot/. /var/jenkins-home
dotnet /app/app.dll
Of course, you can also do it programmatically insede the app itself.
You need to start the task with a volume, then mount that volume into the container. This way you have persistent storage across multiple Jenkins start/stop cycles.
Your task definition would look something like the below (I've removed the non relevant parts). The important components are mountPoints and volumes. Not that this is not the same as volumesFrom as you aren't mounting volumes from another container, but rather running them in a single task.
This also assumes you're running Jenkins in the default JENKINS_HOME directory as well as having mounted your EFS drive to /mnt/efs/jenkins-home on the EC2 instance.
{
"requiresAttributes": ...
"taskDefinitionArn": ... your ARN ...,
"containerDefinitions": [
{
"portMappings": ...
.... more config here .....
"mountPoints": [
{
"containerPath": "/var/jenkins_home",
"sourceVolume": "jenkins-home",
}
]
}
],
"volumes": [
{
"host": {
"sourcePath": "/mnt/efs/jenkins-home"
},
"name": "jenkins-home"
}
],
"family": "jenkins"
}
Task definition within ECS:
I have packer configured to use the amazon-ebs builder to create a custom AMI from the Red Hat 6 image supplied by Red Hat. I'd really like to packer to post process the custom AMI into a virtualbox image for local testing. I've tried adding a simple post processor to my packer json as follows:
"post-processors": [
{
"type": "vagrant",
"keep_input_artifact": false
}
],
But all I end up with is a tiny .box file. When I add this to vagrant, it just seems to be a wrapper for my original AMI in Amazon:
$ vagrant box list
packer (aws, 0)
I was hoping to see something like this:
rhel66 (virtualbox, 0)
Can packer convert my AMI into a virtualbox image?
Post-processor in your example just gives you the vagrant for that image. That image was aws, so no it didn't change anything. To change it to virtualbox you'd have to convert it.
Per the docs have you tried:
{
"type": "virtualbox",
"only": ["virtualbox-iso"],
"artifact_type": "vagrant.box",
"metadata": {
"provider": "virtualbox",
"version": "0.0.1"
}
}
The above is untested. AWS provides some docs on exporting here