AWS Ubuntu 18.04 AMI package installation failed - amazon-web-services

Whenever an AWS autoscaling group launches new ubuntu instance and I try to install any package on that instance it gives me the following error:
[stderr]E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
[stderr]E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend),
Is there another process using it?
I tried to find a solution and manually fixed it but I don't know why whenever the autoscaling group launches a new ubuntu instance it gives the following error.

When any command updates the Ubuntu or installs a new application, it locks the dpkg(Debian Package Manager).
To identify the problem, please look at the logs
If your system is installing some updates you may find journalctl logs journalctl -u apt-daily.service. This usually happend when the system is set to update itslef and you will notice such activity with this ps -ef | grep apt.systemd.daily and you can check these setting in the file /etc/apt/apt.conf.d/20auto-upgrades
/var/log/dpkg.log*(as it may get rotated) check these logs to find which all services were trying to get installed
Once you have identified the problem, you can solve with these methods:
If system is updating, then try to wait by executing sleep command in the --user-dataof your bootstrapping script
If your 1st installation of an service/application is blocking other one, then put a condition to wait/sleep until the first service is up and so on with rest of the services you are installing.
This was a common problem in Ubuntu 16.04 LTS as per, and you can find the same with the solution code https://forums.aws.amazon.com/thread.jspa?threadID=251663
A snippet of code from the referenced link:
until service codedeploy-agent status >/dev/null 2>&1; do
sleep 60
rm -f install
wget https://aws-codedeploy-us-west-2.s3.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
service codedeploy-agent restart
done

SSH into the instance before/while the UserData is running and check which process has acquired the lock:
$ lsof /var/lib/dpkg/lock-frontend
Also, try to enable CodeDeploy agent at the last step after performing all other steps in UserData, like:
https://gist.github.com/say8425/8344d19911dba20fab5538b85006bd31

Related

AWS EC2 User Data not working (Tried Installing and starting httpd via User Data)

The Following is my EC2 User Data:
#!/bin/bash
sudo yum update -y
sudo yum install -y httpd
sudo systemctl start httpd
sudo systemctl enable httpd
In Security Group SSH 22 Port and HTTP 80 Port is Open.
Yet when I try accessing http://public_ip_of_instance the HTTP Apache page doesn't load.
Also, on the Instance Apache is not installed when I checked sudo systemctl status httpd.
I then manually tried it on the EC2 Server and it worked. Then I removed it through yum remove as I wanted to see whether User Data works.
I stopped the Instance and started again but I observed that the User Data Script doesn't work as I am unable to access http page through browser and also on Instance http is not installed.
Where is the actual issue? Some months back this same thing worked on another instance I remember.
Your user data is correct. Whatever is happening with your website is not due to the user data code that you provided.
There could be many reasons it does not work. Public IP of the instance has changed, as always happens when you stop/start the instance. Instance may have per-existing software that clashes with httpd.
Here's some general advice on running UserData once or each startup.
Short answer as John mentioned in the comments EC2's only run the UserData (aka Bootstrap) script once on initalization.
The user data Bash/Powershell is Infrastructure-As-Code. You deploy the script and it installs and configures the machine.
This causes confusion with everyone starting AWS. When you think about it though it doesn't make sense to run the UserData script each time when the PCs already been configured.
What people do often instead is make "Golden Images" (aka Amazon Machine Images - AMI's) of pre-setup EC2s, typically for PCs that take long time to install/configure. The beauty of this is you can setup AutoScaleGroups to use the images which saves any long installation during a scale up event.
Pro Tip: When developing an UserData script run through and test it manually on the EC2. Trust me its far quicker than troubleshooting unattended EC2 UserData errors.
Long answer: you can run the UserData on each boot of the machine using Mime multi-part file. A mime multi-part file allows your script to override how frequently user data is run in the cloud-init package.
https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/
For all those who will run into this problem, first of all check the log with the command:
sudo cat /var/log/cloud-init-output.log
then if you notice connection errors to the various repositories, the reason is because you don't have an internet connection. However, if once inside your EC2 you manage to launch the update and install commands, then the reason why they fail in the UserData is because your EC2 takes a few seconds to get the Internet connection and executes the commands before having it. So to solve this problem, just add this command after #!/bin/bash
#!/bin/bash
until ping -c1 8.8.8.8 &>/dev/null; do :; done
sudo yum update -y
...
This will prevent your EC2 from executing commands before an internet connection is established

Amazon EC2 | CodeDeploy [React] - Deployment succeeds but build folder not populated

TL;DR The command npm run build is taking forever to run on the Amazon EC2 [Ubuntu] instance when I tried running it explicitly by making an SSH. Meanwhile, when I try to create a deployment using CodeDeploy, the deployment takes a good 1 hour time and succeeds but the build folder doesn't get populated, hence I am unable to view my website on the public URL of the EC2 instance. Also, the instance reachability check fails every time after I try to run the command explicitly, and then I have to start and then stop the ec2 instance again! Woof!
Hello everyone, I am trying to deploy my MERN Stack application to AWS but I am stuck now!
Current Progress:
Added both Nginx configs.[Attaching image below]
Nginx is running and there is no problem there!
Added build-app.sh in appspec.yml in the root directory. [View code below]
#!/bin/bash
#clear build directory
cd /home/ubuntu/badlav-app/badlav-client
sudo rm -rf build
sudo mkdir build
#client (Generates a new `build` directory)
cd /home/ubuntu/badlav-app/badlav-client
sudo sh set-prod-env-aws.sh
sudo rm -rf node_modules
sudo npm i
sudo npm run build
#server
cd /home/ubuntu/badlav-app/badlav-server
sudo sh set-prod-env.sh
#back to root
cd /home/ubuntu
appspec.yml
version: 0.0
os: linux
files:
- source: /
destination: /home/ubuntu/badlav-app
hooks:
BeforeInstall:
- location: scripts/build-app.sh
runas: root
Using the above appspec.yml file, the deployment using CodeDeploy succeeds but didn't populate the build folder within /home/badlav-app/badlav-client/build.
So I tried to debug on my own and started running the commands one by one by myself after SSH(ing :P) into the EC2 instance. But when I reach npm run build, the instance just hangs forever. After being exhausted, I have no option left, I terminate the task. Now, when I view my instance on the AWS Console, it has gone berserk! The instance reachability check fails! The only way, I get my instance back is by stopping it and starting it again.
Since I am new to CI/CD, please don't judge my appspec.yml. It'd be great if anyone of you could suggest a better way, thanks for that! :)
To sum up, I want to be able to create a deployment using AWS CodeDeploy, but due to this npm run build taking so much time and hanging my server(instance reach check fails!), I am unable to do so. Moreover, I am not even sure whether npm run build is a problem at all!
I would be more than happy to share any further details/screenshots in order to support my question. Please ask over.
Thanks in advance!
/etc/nginx/nginx.conf/etc/nginx/conf.d/default.conf
If you're using EC2 free tier, the chance is that the instance may have low spec and memory (t2.nano has 0.5G and t2.micro has 1G of memory).
Maybe npm run build consumes all of the memory space.
I often face the same problem with my vue project.
Solution: Do NOT use free tier for medium and large projects. Upgrade your plan and use better instances e.g. t2.medium

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

chef zero via AWS userdata fails

When running chef zero via AWS userdata, the run always fails. However, if I ssh onto the machine and manually execute the same commands, it works as expected. This is the output that I get:
Chef: 11.12.8
[2014-06-11T12:40:34+00:00] INFO: Auto-discovered chef repository at /opt/chef-zero
[2014-06-11T12:40:34+00:00] INFO: Starting chef-zero on port 8889 with repository at repository at /opt/chef-zero
One version per cookbook
[2014-06-11T12:40:34+00:00] INFO: Forking chef instance to converge...
[2014-06-11T12:40:35+00:00] DEBUG: Fork successful. Waiting for new chef pid: 1530
[2014-06-11T12:40:35+00:00] DEBUG: Forked instance now converging
[2014-06-11T12:40:35+00:00] ERROR: undefined method `[]' for nil:NilClass
[2014-06-11T12:40:35+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
The userdata that I set when launching the EC2 instance in AWS includes the following:
curl -L https://www.opscode.com/chef/install.sh | bash
mkdir /opt/chef-zero
cd /opt/chef-zero
wget http://myserver/chef-repo.tar.gz
tar zxf chef-repo
INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
cat <<EOF > /opt/chef-zero/solo.rb
ssl_verify_mode :verify_peer
node_name "$INSTANCE_ID"
EOF
/opt/chef/bin/chef-client -v >chef-zero.log 2>&1
/opt/chef/bin/chef-client -z -l debug -c solo.rb -o 'role[someRole]' -E BUILD >> chef-zero.log 2>&1
The AMI that I'm using is a custom one that was initially provisioned using knife + knife-ec2 (that bootstrapped chef 11.6.0 from an ubuntu 13.04 public ami). The omnibus installer from userdata (curl ... | bash) is upgrading chef to 11.12.8. The original knife run included chef-client::service in it's run, and the host is initially configured for use with chef-client + chef-server (i.e. there's a "validation.pem" and "client.rb" in /etc/chef - not sure if that makes a difference).
I am able to log onto the machine and execute chef-client -z -c solo.rb -o 'role[someRole]' -E BUILD as soon as the machine comes up (after waiting for files to be retrieved and the user-data chef-client to fail) and the chef run executes normally.
I have no idea why the userdata chef-client run fails with undefined method, any ideas what's causing it?
After some further investigation, and thanks to bit of chatting with the #chef guys on freenode, the problem was narrowed down to the environment.
When executing the script with userdata, the "HOME" variable is not set. shell.rb from the chef gem is littered with references to ENV["HOME"].
SSH:
# unset HOME
# chef-client -z -o 'role[test]'
ERROR: undefined method `[]' for nil:NilClass
# export HOME=/root
# chef-client -z -o 'role[test]'
Starting Chef Client, version ....
...
Chef Client finished, ...
If you need to execute chef-client via user data, you should manually export HOME before trying to execute chef.
Bug has been reported at https://tickets.opscode.com/browse/CHEF-5365
edit
Submitted a pull request which has since been merged into master. https://github.com/opscode/chef/pull/1494
This likely has nothing to do with chef-zero but indicates a problem in your recipe code (whatever's inside that chef-repo.tar.gz, or is driven by role[someRole]). It indicates an attempt to access a sub-element of a hash like
node['foo']['bar']
but when node['foo'] is nil (undefined)
Check the stacktrace that's generated by the chef client run to narrow it down.

Vagrant Rsync Error before provisioning

So I'm having some adventures with the vagrant-aws plugin, and I'm now stuck on the issue of syncing folders. This is necessary to provision the machines, which is the ultimate goal. However, running vagrant provision on my machine yields
[root#vagrant-puppet-minimal vagrant]# vagrant provision
[default] Rsyncing folder: /home/vagrant/ => /vagrant
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
mkdir -p '/vagrant'
I'm almost positive the error is caused because ssh-ing manually and running that command yields 'permission denied' (obviously, a non-root user is trying to make a directory in the root directory). I tried ssh-ing as root but it seems like bad practice. (and amazon doesn't like it) How can I change the folder to be rsynced with vagrant-aws? I can't seem to find the setting for that. Thanks!
Most likely you are running into the known vagrant-aws issue #72: Failing with EC2 Amazon Linux Images.
Edit 3 (Feb 2014): Vagrant 1.4.0 (released Dec 2013) and later versions now support the boolean configuration parameter config.ssh.pty. Set the parameter to true to force Vagrant to use a PTY for provisioning. Vagrant creator Mitchell Hashimoto points out that you must not set config.ssh.pty on the global config, you must set it on the node config directly.
This new setting should fix the problem, and you shouldn't need the workarounds listed below anymore. (But note that I haven't tested it myself yet.) See Vagrant's CHANGELOG for details -- unfortunately the config.ssh.pty option is not yet documented under SSH Settings in the Vagrant docs.
Edit 2: Bad news. It looks as if even a boothook will not be "faster" to run (to update /etc/sudoers.d/ for !requiretty) than Vagrant is trying to rsync. During my testing today I started seeing sporadic "mkdir -p /vagrant" errors again when running vagrant up --no-provision. So we're back to the previous point where the most reliable fix seems to be a custom AMI image that already includes the applied patch to /etc/sudoers.d.
Edit: Looks like I found a more reliable way to fix the problem. Use a boothook to perform the fix. I manually confirmed that a script passed as a boothook is executed before Vagrant's rsync phase starts. So far it has been working reliably for me, and I don't need to create a custom AMI image.
Extra tip: And if you are relying on cloud-config, too, you can create a Mime Multi Part Archive to combine the boothook and the cloud-config. You can get the latest version of the write-mime-multipart helper script from GitHub.
Usage sketch:
$ cd /tmp
$ wget https://raw.github.com/lovelysystems/cloud-init/master/tools/write-mime-multipart
$ chmod +x write-mime-multipart
$ cat boothook.sh
#!/bin/bash
SUDOERS_FILE=/etc/sudoers.d/999-vagrant-cloud-init-requiretty
echo "Defaults:ec2-user !requiretty" > $SUDOERS_FILE
echo "Defaults:root !requiretty" >> $SUDOERS_FILE
chmod 440 $SUDOERS_FILE
$ cat cloud-config
#cloud-config
packages:
- puppet
- git
- python-boto
$ ./write-mime-multipart boothook.sh cloud-config > combined.txt
You can then pass the contents of 'combined.txt' to aws.user_data, for instance via:
aws.user_data = File.read("/tmp/combined.txt")
Sorry for not mentioning this earlier, but I am literally troubleshooting this right now myself. :)
Original answer (see above for a better approach)
TL;DR: The most reliable fix is to "patch" a stock Amazon Linux AMI image, save it and then use the customized AMI image in your Vagrantfile. See below for details.
Background
A potential workaround is described (and linked in the bug report above) at https://github.com/mitchellh/vagrant-aws/pull/70/files. In a nutshell, add the following to your Vagrantfile:
aws.user_data = "#!/bin/bash\necho 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty && chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty\nyum install -y puppet\n"
Most importantly this will configure the OS to not require a tty for user ec2-user, which seems to be the root of the problem. I /think/ that the additional installation of the puppet package is not required for the actual fix (although Vagrant may use Puppet for provisioning the machine later, depending on how you configured Vagrant).
My experience with the described workaround
I have tried this workaround but Vagrant still occasionally fails with the same error. It might be a "race condition" where Vagrant happens to run its rsync phase faster than cloud-init (which is what aws.user_data is passing information to) can prepare the workaround for #72 on the machine for Vagrant. If Vagrant is faster you will see the same error; if cloud-init is faster it works.
What will work (but requires more effort on your side)
What definitely works is to run the command on a stock Amazon Linux AMI image, and then save the modified image (= create an image snapshot) as a custom AMI image of yours.
# Start an EC2 instance with a stock Amazon Linux AMI image and ssh-connect to it
$ sudo su - root
$ echo 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty
$ chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty
# Note: Installing puppet is mentioned in the #72 bug report but I /think/ you do not need it
# to fix the described Vagrant problem.
$ yum install -y puppet
You must then use this custom AMI image in your Vagrantfile instead of the stock Amazon one. The obvious drawback is that you are not using a stock Amazon AMI image anymore -- whether this is a concern for you or not depends on your requirements.
What I tried but didn't work out
For the record: I also tried to pass a cloud-config to aws.user_data that included a bootcmd to set !requiretty in the same way as the embedded shell script above. According to the cloud-init docs bootcmd is run "very early" in the startup cycle for an EC2 instance -- the idea being that bootcmd instructions would be run earlier than Vagrant would try to run its rsync phase. But unfortunately I discovered that the bootcmd feature is not implemented in the outdated cloud-init version of current Amazon's Linux AMIs (e.g. ami-05355a6c has cloud-init 0.5.15-69.amzn1 but bootcmd was only introduced in 0.6.1).