Packer's Puppet provisioning stalls - virtualbox

I'm very new to the this whole Packer/Vagrant,Puppet world. I'm trying to build my first VM using Packer and Puppet.
I can successfully build a virtualbox and I've included a shell script provisioner to install puppet. I've ssh'ed into the VM to verify that it works and puppet is installed.
Then I added an additional puppet-masterless provisioner that looks simply like this:
# java dependency
package { 'openjdk-7-jdk' :
ensure => present
}
When I run packer, it gets to this point and gets stuck:
==> virtualbox-iso: Provisioning with Puppet...
virtualbox-iso: Creating Puppet staging directory...
virtualbox-iso: Uploading manifests...
virtualbox-iso: Running Puppet: sudo -E puppet apply --verbose --modulepath='' --detailed-exitcodes /tmp/packer-puppet-masterless/manifests/ubuntu.pp
Any suggestions would be helpful. Even on how to debug it to see what's going on behind the scenes

I was having the same problem, and changed the execute_command to receive the password of the vagrant user.
"override": {
"virtualbox-iso": {
"execute_command": "echo 'vagrant' | {{.FacterVars}}{{if .Sudo}} sudo -S -E {{end}}puppet apply --verbose --modulepath='{{.ModulePath}}' {{if ne .HieraConfigPath \"\"}}--hiera_config='{{.HieraConfigPath}}' {{end}} {{if ne .ManifestDir \"\"}}--manifestdir='{{.ManifestDir}}' {{end}} --detailed-exitcodes {{.ManifestFile}}"
}
}
The whole block looks like this
{
"type": "puppet-masterless",
"manifest_file": "../puppet/manifests/base.pp",
"module_paths": [
"../puppet/modules/"
],
"override": {
"virtualbox-iso": {
"execute_command": "echo 'vagrant' | {{.FacterVars}}{{if .Sudo}} sudo -S -E {{end}}puppet apply --verbose --modulepath='{{.ModulePath}}' {{if ne .HieraConfigPath \"\"}}--hiera_config='{{.HieraConfigPath}}' {{end}} {{if ne .ManifestDir \"\"}}--manifestdir='{{.ManifestDir}}' {{end}} --detailed-exitcodes {{.ManifestFile}}"
}
}
}
Source: Found an example here https://github.com/AdoptOpenJDK/openjdk-virtual-images/blob/master/packer/openjdk-development/openjdk-development.json

Related

Wrong Yarn node label mapping with AWS EMR machine types

Does anyone have experience with Yarn node labels on AWS EMR? If so you please share your thoughts. We want to run All the Spark executors on Task(Spot) machine and all the Spark ApplicationMaster/Driver on Core(on-Demand) machine. Previously we were running Spark executors and Spark Driver all on the CORE machine(on-demand).
In order to achieve this, we have created the "TASK" yarn node label as a part of a custom AWS EMR Bootstrap action. And Have mapped the same "TASK" yarn label when any Spot instance is registered with AWS EMR in a separate bootstrap action. As "CORE" is the default yarn node label expression, so we are simply mapping it with an on-demand instance upon registration of the node in the bootstrap action.
We are using "spark.yarn.executor.nodeLabelExpression": "TASK" spark conf to launch spark executors on Task nodes.
So.. we are facing the problem of the wrong mapping of the Yarn node label with the appropriate machine i.e For a short duration of time(around 1-2 mins) the "TASK" yarn node label is mapped with on-demand instances and "CORE" yarn node label is mapped with spot instance. So During this short duration of wrong labeling Yarn launches Spark executors on On-demand instances and Spark drivers on Spot instances.
This wrong mapping of labels with corresponding machine type persists till the bootstrap actions are complete and after that, the mapping is automatically resolved to its correct state.
The script we are running as a part of the bootstrap action:
This script is run on all new machines to assign a label to that machine. The script is being run as a background PID as the yarn is only available after all custom bootstrap actions are completed
#!/usr/bin/env bash
set -ex
function waitTillYarnComesUp() {
IS_YARN_EXIST=$(which yarn | grep -i yarn | wc -l)
while [ $IS_YARN_EXIST != '1' ]
do
echo "Yarn not exist"
sleep 15
IS_YARN_EXIST=$(which yarn | grep -i yarn | wc -l)
done
echo "Yarn exist.."
}
function waitTillTaskLabelSyncs() {
LABEL_EXIST=$(yarn cluster --list-node-labels | grep -i TASK | wc -l)
while [ $LABEL_EXIST -eq 0 ]
do
sleep 15
LABEL_EXIST=$(yarn cluster --list-node-labels | grep -i TASK | wc -l)
done
}
function getHostInstanceTypeAndApplyLabel() {
HOST_IP=$(curl http://169.254.169.254/latest/meta-data/local-hostname)
echo "host ip is ${HOST_IP}"
INSTANCE_TYPE=$(curl http://169.254.169.254/latest/meta-data/instance-life-cycle)
echo "instance type is ${INSTANCE_TYPE}"
PORT_NUMBER=8041
spot="spot"
onDemand="on-demand"
if [ $INSTANCE_TYPE == $spot ]; then
yarn rmadmin -replaceLabelsOnNode "${HOST_IP}:${PORT_NUMBER}=TASK"
elif [ $INSTANCE_TYPE == $onDemand ]
then
yarn rmadmin -replaceLabelsOnNode "${HOST_IP}:${PORT_NUMBER}=CORE"
fi
}
waitTillYarnComesUp
# holding for resource manager sync
sleep 100
waitTillTaskLabelSyncs
getHostInstanceTypeAndApplyLabel
exit 0
yarn rmadmin -addToClusterNodeLabels "TASK(exclusive=false)"
This command is being run on the Master instance to create a new TASK yarn node label at the time of cluster creation.
Does anyone have clue to prevent this wrong mapping of labels?
I would like to propose the next:
Create every node with some default label, like LABEL_PENDING. You can do it using the EMR classifications;
In the bootstrap script, you should identify if the current node is On-Demand or Spot instance;
After that, on every node you should update change LABEL_PENDING in /etc/hadoop/conf/yarn-site.xml to ON_DEMAND or SPOT;
On the master node, you should add 3 labels to YARN: LABEL_PENDING, ON_DEMAND, and SPOT.
Example of EMR Classifications:
[
{
"classification": "yarn-site",
"properties": {
"yarn.node-labels.enabled": "true",
"yarn.node-labels.am.default-node-label-expression": "ON_DEMAND",
"yarn.nodemanager.node-labels.provider.configured-node-partition": "LABEL_PENDING"
},
"configurations": []
},
{
"classification": "capacity-scheduler",
"properties": {
"yarn.scheduler.capacity.root.accessible-node-labels.ON_DEMAND.capacity": "100",
"yarn.scheduler.capacity.root.accessible-node-labels.SPOT.capacity": "100",
"yarn.scheduler.capacity.root.default.accessible-node-labels.ON_DEMAND.capacity": "100",
"yarn.scheduler.capacity.root.default.accessible-node-labels.SPOT.capacity": "100"
},
"configurations": []
},
{
"classification": "spark-defaults",
"properties": {
"spark.yarn.am.nodeLabelExpression": "ON_DEMAND",
"spark.yarn.executor.nodeLabelExpression": "SPOT"
},
"configurations": []
}
]
Example of the additional part to your bootstrap script
yarnNodeLabelConfig="yarn.nodemanager.node-labels.provider.configured-node-partition"
yarnSiteXml="/etc/hadoop/conf/yarn-site.xml"
function waitForYarnConfIsReady() {
while [[ ! -e $yarnSiteXml ]]; do
sleep 2
done
IS_CONF_PRESENT_IN_FILE=$(grep $yarnNodeLabelConfig $yarnSiteXml | wc -l)
while [[ $IS_CONF_PRESENT_IN_FILE != "1" ]]
do
echo "Yarn conf file doesn't have properties"
sleep 2
IS_CONF_PRESENT_IN_FILE=$(grep $yarnNodeLabelConfig $yarnSiteXml | wc -l)
done
}
function updateLabelInYarnConf() {
INSTANCE_TYPE=$(curl http://169.254.169.254/latest/meta-data/instance-life-cycle)
echo "Instance type is $INSTANCE_TYPE"
if [[ $INSTANCE_TYPE == "spot" ]]; then
sudo sed -i 's/>LABEL_PENDING</>SPOT</' $yarnSiteXml
elif [[ $INSTANCE_TYPE == "on-demand" ]]
then
sudo sed -i 's/>LABEL_PENDING</>ON_DEMAND</' $yarnSiteXml
fi
}
waitForYarnConfIsReady
updateLabelInYarnConf

Is there a way to confirm user_data ran successfully with Terraform for EC2?

I'm wondering if it's possible to know when the script in user data executes completely?
data "template_file" "script" {
template = file("${path.module}/installing.sh")
}
data "template_cloudinit_config" "config" {
gzip = false
base64_encode = false
# Main cloud-config configuration file.
part {
filename = "install.sh"
content = "${data.template_file.script.rendered}"
}
}
resource "aws_instance" "web" {
ami = "ami-04e7b4117bb0488e4"
instance_type = "t2.micro"
key_name = "KEY"
vpc_security_group_ids = [aws_default_security_group.default.id]
subnet_id = aws_default_subnet.default_az1.id
associate_public_ip_address = true
iam_instance_profile = "Role_S3"
user_data = data.template_cloudinit_config.config.rendered
tags = {
Name = "Terraform-Ansible"
}
}
And in the content of the script I have this.
It tells me Terraform successfully apply the changes, but the script is still running, is there a way I can monitor that?
#!/usr/bin/env bash
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
echo BEGIN
sudo apt update
sudo apt upgrade -y
sudo apt install -y unzip
echo END
No, You can not confirm the user data status from the terraform, as it posts launching script that executes once EC2 instance launched. But you will need some extra effort on init script that one way to check.
How to check User Data status while launching the instance in aws
If you do something that is mentioned above to make some marker file once user data completed, then you can try this to check.
resource "null_resource" "user_data_status_check" {
provisioner "local-exec" {
on_failure = "fail"
interpreter = ["/bin/bash", "-c"]
command = <<EOT
echo -e "\x1B[31m wait for few minute for instance warm up, adjust accordingly \x1B[0m"
# wait 30 sec
sleep 30
ssh -i yourkey.pem instance_ip ConnectTimeout=30 -o 'ConnectionAttempts 5' test -f "/home/user/markerfile.txt" && echo found || echo not found
if [ $? -eq 0 ]; then
echo "user data sucessfully executed"
else
echo "Failed to execute user data"
fi
EOT
}
triggers = {
#remove this once you test it out as it should run only once
always_run ="${timestamp()}"
}
depends_on = ["aws_instance.my_instance"]
}
so this script will check marker file on the newly launch server by doing ssh with timeout 30 seconds with max attempts 5.
Here are some pointers to remember:
User data shell scripts must start with the Shebang #! characters and the path to the interpreter you want to read the script (commonly /bin/bash).
Scripts entered as user data are run as the root user, so no need to use the sudo command in the init script.
When a user data script is processed, it is copied to and run from /var/lib/cloud/instances/instance-id/. The script is not deleted after it is run and can be found in this directory with the name user-data.txt So to check if your shell script made to the server refer this directory and the file.
The cloud-init output log file (/var/log/cloud-init-output.log) captures console output of your user_data shell script. to know how your user_data shell script was executed and its output check this file.
Source: https://www.middlewareinventory.com/blog/terraform-aws-ec2-user_data-example/
Well I use these two ways to confirm.
At the end of cloudinit config file this line sends me a notification through whatsapp (using callmebot). Thus no matter how much does it take to setup, I always get notified when it's ready to use. I watch some series or read something in that time. no time wasted.
curl -X POST "https://api.callmebot.com/whatsapp.php?phone=12345678910&text=Ec2+transcoder+setup+complete&apikey=12345"
At the end of cloudinit config this line runs -
echo "for faster/visual confirmation of above execution.."
wget https://www.sample-videos.com/video123/mp4/720/big_buck_bunny_720p_1mb.mp4 -O /home/ubuntu/dpnd_comp.mp4
When I sign in to the instance I can see directly the file.
And I'm loving it. Hope this helps someone. Also, don't forget to tell me your method too.

Serverspec test fail when i run its pipeline from other pipeline

I'm running 3 pipelines in jenkins (CI, CD, CDP) when I run the CI pipe the final stage is a trigger for activate the pipe CD (Continuous Deployment), this receive a parameter APP_VERSION from CI (Continuous Integration) PIPE and deploy an instance with packer and run SERVERSPEC TEST, but serverspec test failed.
but the demo-app is installed via salstack
The strange is when I run the CD and pass the parameter APP_VERSION manually this WORK !!
this is the final stage for pipeline CI
stage "Trigger downstream"
echo 'parametro'
def versionApp = sh returnStdout: true, script:"echo \$(git rev-parse --short HEAD) "
build job: "demo-pipeCD", parameters: [[$class: "StringParameterValue", name: "APP_VERSION", value: "${versionApp}"]], wait: false
}
I have passed to serverspec the sbin PATH and not work.
EDIT: I add the code the test.
enter code here
require 'spec_helper'
versionFile = open('/tmp/APP_VERSION')
appVersion = versionFile.read.chomp
describe package("demo-app-#{appVersion}") do
it { should be_installed }
end
Also, i add the job pipeline
#!groovy
node {
step([$class: 'WsCleanup'])
stage "Checkout Git repo"
checkout scm
stage "Checkout additional repos"
dir("pipeCD") {
git "https://git-codecommit.us-east-
1.amazonaws.com/v1/repos/pipeCD"
}
stage "Run Packer"
sh "echo $APP_VERSION"
sh "\$(export PATH=/usr/bin:/root/bin:/usr/local/bin:/sbin)"
sh "/opt/packer validate -var=\"appVersion=$APP_VERSION\" -var-
file=packer/demo-app_vars.json packer/demo-app.json"
sh "/opt/packer build -machine-readable -
var=\"appVersion=$APP_VERSION\" -var-file=packer/demo-app_vars.json
packer/demo-app.json | tee packer/packer.log"
REPEAT .. the parameter APP_VERSION in job pipe is rigth and the demo-app app is installed before the execute the test.

Environment Variables in newest AWS EC2 instance

I am trying to get ENVIRONMENT Variables into the EC2 instance (trying to run a django app on Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type ami-0ff8a91507f77f867 ). How do you get them in the newest version of amazon's linux, or get the logging so it can be traced.
user-data text (modified from here):
#!/bin/bash
#trying to get a file made
touch /tmp/testfile.txt
cat 'This and that' > /tmp/testfile.txt
#trying to log
echo 'Woot!' > /home/ec2-user/user-script-output.txt
#Trying to get the output logged to see what is going wrong
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
#trying to log
echo "XXXXXXXXXX STARTING USER DATA SCRIPT XXXXXXXXXXXXXX"
#trying to store the ENVIRONMENT VARIABLES
PARAMETER_PATH='/'
REGION='us-east-1'
# Functions
AWS="/usr/local/bin/aws"
get_parameter_store_tags() {
echo $($AWS ssm get-parameters-by-path --with-decryption --path ${PARAMETER_PATH} --region ${REGION})
}
params_to_env () {
params=$1
# If .Ta1gs does not exist we assume ssm Parameteres object.
SELECTOR="Name"
for key in $(echo $params | /usr/bin/jq -r ".[][].${SELECTOR}"); do
value=$(echo $params | /usr/bin/jq -r ".[][] | select(.${SELECTOR}==\"$key\") | .Value")
key=$(echo "${key##*/}" | /usr/bin/tr ':' '_' | /usr/bin/tr '-' '_' | /usr/bin/tr '[:lower:]' '[:upper:]')
export $key="$value"
echo "$key=$value"
done
}
# Get TAGS
if [ -z "$PARAMETER_PATH" ]
then
echo "Please provide a parameter store path. -p option"
exit 1
fi
TAGS=$(get_parameter_store_tags ${PARAMETER_PATH} ${REGION})
echo "Tags fetched via ssm from ${PARAMETER_PATH} ${REGION}"
echo "Adding new variables..."
params_to_env "$TAGS"
Notes -
What i think i know but am unsure
the user-data script is only loaded when it is created, not when I stop and then start mentioned here (although it also says [i think outdated] that the output is logged to /var/log/cloud-init-output.log )
I may not be starting the instance correctly
I don't know where to store the bash script so that it can be executed
What I have verified
the user-data text is on the instance by ssh-ing in and curl http://169.254.169.254/latest/user-data shows the current text (#!/bin/bash …)
What Ive tried
editing rc.local directly to export AWS_ACCESS_KEY_ID='JEFEJEFEJEFEJEFE' … and the like
putting them in the AWS Parameter Store (and can see them via the correct call, I just can't trace getting them into the EC2 instance without logs or confirming if the user-data is getting run)
putting ENV variables in Tags and importing them as mentioned here:
tried outputting the logs to other files as suggested here (Not seeing any log files in the ssh instance or on the system log)
viewing the System Log on the aws webpage to see any errors/logs via selecting the instance -> 'Actions' -> 'Instance Settings' -> 'Get System Log' (not seeing any commands run or log statements [only 1 unrelated word of user])

GCP Dataproc - configure YARN fair scheduler

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.
I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation:
"softwareConfig": {
"properties": {
"yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
"yarn:yarn.scheduler.fair.user-as-default-queue":"false",
"yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
}
}
with another line in init action script:
sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml
and the cluster tells me this when I fetch its config:
'softwareConfig': {
'imageVersion': '1.2.27',
'properties': {
'capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy': 'fair',
'core:fs.gs.block.size': '134217728',
'core:fs.gs.metadata.cache.enable': 'false',
'distcp:mapreduce.map.java.opts': '-Xmx4096m',
'distcp:mapreduce.map.memory.mb': '5120',
'distcp:mapreduce.reduce.java.opts': '-Xmx4096m',
'distcp:mapreduce.reduce.memory.mb': '5120',
'hdfs:dfs.datanode.address': '0.0.0.0:9866',
'hdfs:dfs.datanode.http.address': '0.0.0.0:9864',
'hdfs:dfs.datanode.https.address': '0.0.0.0:9865',
'hdfs:dfs.datanode.ipc.address': '0.0.0.0:9867',
'hdfs:dfs.namenode.http-address': '0.0.0.0:9870',
'hdfs:dfs.namenode.https-address': '0.0.0.0:9871',
'hdfs:dfs.namenode.secondary.http-address': '0.0.0.0:9868',
'hdfs:dfs.namenode.secondary.https-address': '0.0.0.0:9869',
'mapred-env:HADOOP_JOB_HISTORYSERVER_HEAPSIZE': '3840',
'mapred:mapreduce.job.maps': '189',
'mapred:mapreduce.job.reduce.slowstart.completedmaps': '0.95',
'mapred:mapreduce.job.reduces': '63',
'mapred:mapreduce.map.cpu.vcores': '1',
'mapred:mapreduce.map.java.opts': '-Xmx4096m',
'mapred:mapreduce.map.memory.mb': '5120',
'mapred:mapreduce.reduce.cpu.vcores': '1',
'mapred:mapreduce.reduce.java.opts': '-Xmx4096m',
'mapred:mapreduce.reduce.memory.mb': '5120',
'mapred:mapreduce.task.io.sort.mb': '256',
'mapred:yarn.app.mapreduce.am.command-opts': '-Xmx4096m',
'mapred:yarn.app.mapreduce.am.resource.cpu-vcores': '1',
'mapred:yarn.app.mapreduce.am.resource.mb': '5120',
'spark-env:SPARK_DAEMON_MEMORY': '3840m',
'spark:spark.driver.maxResultSize': '1920m',
'spark:spark.driver.memory': '3840m',
'spark:spark.executor.cores': '8',
'spark:spark.executor.memory': '37237m',
'spark:spark.yarn.am.memory': '640m',
'yarn:yarn.nodemanager.resource.memory-mb': '81920',
'yarn:yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler',
'yarn:yarn.scheduler.fair.allocation.file': '$HADOOP_CONF_DIR/fair-scheduler.xml',
'yarn:yarn.scheduler.fair.user-as-default-queue': 'false',
'yarn:yarn.scheduler.maximum-allocation-mb': '81920',
'yarn:yarn.scheduler.minimum-allocation-mb': '1024'
}
},
The file fair-scheduler.xml also contains the specified code (everything is in one line, but I don't think this could be the problem)
After all this, the cluster still acts like if the capacity scheduler was in charge. No idea why. Any recommendation would help.
Thanks.
as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.
So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted.
It can be done using this command:
sudo systemctl restart hadoop-yarn-resourcemanager.service
Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well.
The code for init actions script is the following:
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
echo " <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
sed -i '$ d' /etc/hadoop/conf/yarn-site.xml
echo " <property>" >> /etc/hadoop/conf/yarn-site.xml
echo " <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
echo " <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
echo " </property>" >> /etc/hadoop/conf/yarn-site.xml
echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
systemctl restart hadoop-yarn-resourcemanager.service
fi