chef zero via AWS userdata fails - amazon-web-services

When running chef zero via AWS userdata, the run always fails. However, if I ssh onto the machine and manually execute the same commands, it works as expected. This is the output that I get:
Chef: 11.12.8
[2014-06-11T12:40:34+00:00] INFO: Auto-discovered chef repository at /opt/chef-zero
[2014-06-11T12:40:34+00:00] INFO: Starting chef-zero on port 8889 with repository at repository at /opt/chef-zero
One version per cookbook
[2014-06-11T12:40:34+00:00] INFO: Forking chef instance to converge...
[2014-06-11T12:40:35+00:00] DEBUG: Fork successful. Waiting for new chef pid: 1530
[2014-06-11T12:40:35+00:00] DEBUG: Forked instance now converging
[2014-06-11T12:40:35+00:00] ERROR: undefined method `[]' for nil:NilClass
[2014-06-11T12:40:35+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
The userdata that I set when launching the EC2 instance in AWS includes the following:
curl -L https://www.opscode.com/chef/install.sh | bash
mkdir /opt/chef-zero
cd /opt/chef-zero
wget http://myserver/chef-repo.tar.gz
tar zxf chef-repo
INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
cat <<EOF > /opt/chef-zero/solo.rb
ssl_verify_mode :verify_peer
node_name "$INSTANCE_ID"
EOF
/opt/chef/bin/chef-client -v >chef-zero.log 2>&1
/opt/chef/bin/chef-client -z -l debug -c solo.rb -o 'role[someRole]' -E BUILD >> chef-zero.log 2>&1
The AMI that I'm using is a custom one that was initially provisioned using knife + knife-ec2 (that bootstrapped chef 11.6.0 from an ubuntu 13.04 public ami). The omnibus installer from userdata (curl ... | bash) is upgrading chef to 11.12.8. The original knife run included chef-client::service in it's run, and the host is initially configured for use with chef-client + chef-server (i.e. there's a "validation.pem" and "client.rb" in /etc/chef - not sure if that makes a difference).
I am able to log onto the machine and execute chef-client -z -c solo.rb -o 'role[someRole]' -E BUILD as soon as the machine comes up (after waiting for files to be retrieved and the user-data chef-client to fail) and the chef run executes normally.
I have no idea why the userdata chef-client run fails with undefined method, any ideas what's causing it?

After some further investigation, and thanks to bit of chatting with the #chef guys on freenode, the problem was narrowed down to the environment.
When executing the script with userdata, the "HOME" variable is not set. shell.rb from the chef gem is littered with references to ENV["HOME"].
SSH:
# unset HOME
# chef-client -z -o 'role[test]'
ERROR: undefined method `[]' for nil:NilClass
# export HOME=/root
# chef-client -z -o 'role[test]'
Starting Chef Client, version ....
...
Chef Client finished, ...
If you need to execute chef-client via user data, you should manually export HOME before trying to execute chef.
Bug has been reported at https://tickets.opscode.com/browse/CHEF-5365
edit
Submitted a pull request which has since been merged into master. https://github.com/opscode/chef/pull/1494

This likely has nothing to do with chef-zero but indicates a problem in your recipe code (whatever's inside that chef-repo.tar.gz, or is driven by role[someRole]). It indicates an attempt to access a sub-element of a hash like
node['foo']['bar']
but when node['foo'] is nil (undefined)
Check the stacktrace that's generated by the chef client run to narrow it down.

Related

AWS Ubuntu 18.04 AMI package installation failed

Whenever an AWS autoscaling group launches new ubuntu instance and I try to install any package on that instance it gives me the following error:
[stderr]E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
[stderr]E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend),
Is there another process using it?
I tried to find a solution and manually fixed it but I don't know why whenever the autoscaling group launches a new ubuntu instance it gives the following error.
When any command updates the Ubuntu or installs a new application, it locks the dpkg(Debian Package Manager).
To identify the problem, please look at the logs
If your system is installing some updates you may find journalctl logs journalctl -u apt-daily.service. This usually happend when the system is set to update itslef and you will notice such activity with this ps -ef | grep apt.systemd.daily and you can check these setting in the file /etc/apt/apt.conf.d/20auto-upgrades
/var/log/dpkg.log*(as it may get rotated) check these logs to find which all services were trying to get installed
Once you have identified the problem, you can solve with these methods:
If system is updating, then try to wait by executing sleep command in the --user-dataof your bootstrapping script
If your 1st installation of an service/application is blocking other one, then put a condition to wait/sleep until the first service is up and so on with rest of the services you are installing.
This was a common problem in Ubuntu 16.04 LTS as per, and you can find the same with the solution code https://forums.aws.amazon.com/thread.jspa?threadID=251663
A snippet of code from the referenced link:
until service codedeploy-agent status >/dev/null 2>&1; do
sleep 60
rm -f install
wget https://aws-codedeploy-us-west-2.s3.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
service codedeploy-agent restart
done
SSH into the instance before/while the UserData is running and check which process has acquired the lock:
$ lsof /var/lib/dpkg/lock-frontend
Also, try to enable CodeDeploy agent at the last step after performing all other steps in UserData, like:
https://gist.github.com/say8425/8344d19911dba20fab5538b85006bd31

Running updates on EC2s that roll back on failure of status check

I’m setting up a patch process for EC2 servers running a web application.
I need to build an automated process that installs system updates but, reverts back to the last working ec2 instance if the web application fails a status check.
I’ve been trying to do this using an Automation Document in EC2 Systems Manager that performs the following steps:
Stop EC2 instance
Create AMI from instance
Launch new instance from newly created AMI
Run updates
Run status check on web application
If check fails, stop new instance and restart original instance
The Automation Document runs the first 5 steps successfully, but I can't identify how to trigger step 6? Can I do this within the Automation Document? What output would I be able to call from step 5? If it uses aws:runCommand, should the runCommand trigger a new automation document or another AWS tool?
I tried the following to solve this, which more or less worked:
Included an aws:runCommand action in the automation document
This ran the DocumentName "AWS-RunShellScript" with the following parameters:
Downloaded the script from s3:
sudo aws s3 cp s3://path/to/s3/script.sh /tmp/script.sh
Set the file to executable:
chmod +x /tmp/script.sh
Executed the script using variables set in, or generated by the automation document
bash /tmp/script.sh -o {{VAR1}} -n {{VAR2}} -i {{VAR3}} -l {{VAR4}} -w {{VAR5}}
The script included the following getopts command to set the inputted variables:
while getopts o:n:i:l:w: option
do
case "${option}"
in
n) VAR1=${OPTARG};;
o) VAR2=${OPTARG};;
i) VAR3=${OPTARG};;
l) VAR4=${OPTARG};;
w) VAR5=${OPTARG};;
esac
done
The bash script used the variables to run the status check, and roll back to last working instance if it failed.

'peer' command not found hyperledger

I'm working on this tutorial:
http://hyperledger-fabric.readthedocs.io/en/latest/getting_started.html
At the section "Create & Join Channel" at the command :
peer channel create -o orderer.example.com:7050 -c $CHANNEL_NAME -f ./channel-artifacts/channel.tx --tls $CORE_PEER_TLS_ENABLED --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/example.com/orderers/orderer.example.com/msp/cacerts/ca.example.com-cert.pem
I received this error:
No command 'peer' found, did you mean:
Command 'pee' from package 'moreutils' (universe)
Command 'beer' from package 'gerstensaft' (universe)
Command 'peel' from package 'ears' (universe)
Command 'pear' from package 'php-pear' (main)
peer: command not found
Since you are following the guide, I suppose you are using Docker and it seems that you are not connected to the cli container, otherwise, it would have known the command "peer" (I might be mistaken).
To connect to the cli container:
docker exec -it cli bash
If this is not the problem, you can try the command from the bin folder :
/usr/local/bin
But this folder should be in the PATH environment variable, for example:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
This error means that your kernel cannot find the peer binaries. So it's important that the path to the peer binaries is included in your path. If you are in the directory where all the files for the hyperledger fabric are residing (ex. fabrics or fabric-samples) run:
export PATH=${PWD}/../bin:$PATH
if you are in the folder ../test-network as I am, try first these two following commands which are in the Interacting with the network section:
export PATH=${PWD}/../bin:$PATH
export FABRIC_CFG_PATH=$PWD/../config/
Then you will be able set the environmental variables which will allow you to operate the peer CLI as Org1 or Org2.
I assumed that your network is up and running.
Please check which docker image you're using to run peer commands.
run docker ps
Check the docker images name
chaincode is build and start in chaincode docker image
docker exec -it chaincode bash
and to interact and run peer commands run cli docker image
docker exec -it cli bash

Cron job openshift error

I have a rails 4 openshift application. I am trying to run a cron job. The script runs completely fine when I run it by itself. The script is:
#!/bin/bash
/bin/bash -l -c 'cd $OPENSHIFT_REPO_DIR && bundle exec bin/rails runner -e production "Payment.charge_customers_pay_experts"'
The problem is the log file gives me the following error
Wed Feb 3 22:57:05 EST 2016: START minutely cron run
__________________________________________________________________________
/var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo//.openshift/åcron/minutely/charge_customers_pay_experts:
Warning: You're using Rubygems 2.0.14 with Spring. Upgrade to at least Rubygems 2.1.0 and run `gem pristine --all` for better startup performance.
/var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/sid.rb:39:in `getpgid': Permission denied (Errno::EACCES)
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/sid.rb:39:in `pgid'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:78:in `set_pgid'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:34:in `boot'
from /var/lib/openshift/56a438107628e18b30000111/app-root/runtime/repo/vendor/bundle/ruby/gems/spring-1.6.2/lib/spring/server.rb:14:in `boot'
from -e:1:in `<main>'
__________________________________________________________________________
Wed Feb 3 22:57:06 EST 2016: END minutely cron run - status=0
__________________________________________________________________________
I have made sure the script was executable. I'm not sure if I am missing something. Does anyone have any thoughts?
I don't know that the script being executable necessarily has anything to do with this. It looks like a permissions error more than anything. Does the system user that runs the cron job have the correct permissions to run? You can test this by logging into that user (or sudo su - <user>) and then execute the command in the script manually.
/bin/bash -l -c 'cd $OPENSHIFT_REPO_DIR && bundle exec bin/rails runner -e production "Payment.charge_customers_pay_experts"'
Be sure to replace your $OPENSHIFT_REPO_DIR variable with the correct path to your OpenShift repo directory.
You may just need to either add the user your cronjob runs as to the group that has permissions over the files, or perhaps run the cronjob as a more privileged user (privileged in that it has permissions over the required files).
BTW, I could only post this as an answer as Stack Overflow is telling me I need 50 reputation points to comment.
I fixed this by commenting out the 'spring' gem in my gemfile. But apparently this is a known issue. https://bugzilla.redhat.com/show_bug.cgi?id=1305544.
There is a workaround for the time being until this issue is resolved. You can edit the /usr/libexec/openshift/cartridges/cron/bin/cron_runjobs.sh to add setsid in front of timeout so that it runs setsid timeout ... as this allows for the timeout command to actually change the sid.

cloudformation composer install

So I am using cloudformation for my AWS setup, I am trying to run composer but for some reason no matter what command I put in my userdata section I always can an error, this is my error:
php /usr/local/bin/composer.phar create-project composer/satis /var/www/satis --stability=dev
[RuntimeException]
The HOME or COMPOSER_HOME environment variable must be set for composer to run correctly
This is my code within the userdata section:
"#composer\n",
"curl -sS https://getcomposer.org/installer | php\n",
"mv composer.phar /usr/local/bin/composer.phar\n",
"#satis\n",
"php /usr/local/bin/composer.phar create-project composer/satis /var/www/satis --stability=dev\n",
Does anyone have any ideas why this might not work and should I should be doing ?
Composer is looking for the location of the .composer directory. Export the HOME or COMPOSER_HOME env variable, e.g. : HOME=/root php /usr/local/bin/composer.phar create-project composer/satis /var/www/satis --stability=dev, it will work fine then.
I had the similar issue with amazon linux ami 2, it was showing in the log All settings correct for using Composer. The HOME or COMPOSER_HOME environment variable must be set for composer to run correctly, but it was not installed at all. Below is the way to fix it. Might be helpful to somebody rather waisting 2,3 hours!
sudo curl -sS https://getcomposer.org/installer | sudo php
mv composer.phar /usr/bin/composer
chmod +x /usr/bin/composer
export COMPOSER_HOME=/root
Agree with Ntwobike's answer.
When launching AWS EC2 instances I was installing composer by running an Ansible playbook during in the user data script run. (The user data script is called by cloud-init during the instance build process).
For some reason at this point in the build the $HOME environment variable is not set. So I needed to add 'export HOME=/root' - e.g.
# These need to be set to enable the composer installer to run. It is probably due to an issue
# with the $HOME variable not yet being set at this point in the instance creation.
export HOME=/root
ansible-playbook --extra-vars "target=localhost" playbooks/debian-9/drush.yml