How can I get cadvisor (Docker) working with AWS/Debian? - amazon-web-services

I have an AWS instance set up (Debian) onto which I've installed Docker.
I can successfully run the hello-world container, as well as running ubuntu as recommended in the Docker install validation.
I want to run cadvisor. So I ran the recommended quick-start script:
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
That gave me no error but when I do a 'sudo docker ps' nothing's there; like it fired up and died or otherwise shut itself down.
I tried adding "--logtostderr" to the end to see what I could see--and saw:
I0108 19:19:55.308016 00001 storagedriver.go:89] Caching 60 recent stats in memory; using "" storage driver
I0108 19:19:55.308353 00001 manager.go:78] cAdvisor running in container: "/docker/e3b5ede6f6def6b36d7682814aefc2b414defaea065ccf977a1a2542a80c310c"
F0108 19:19:55.337891 00001 cadvisor.go:76] Failed to create a Container Manager: failed to get cache information for node 0: open /sys/devices/system/cpu/cpu1/cache: no such file or directory
Do I need to do something different for a Debian system?

If you notice the docker command and the error we are explicitly mounting in the sys directory from the host system. --volume=/sys:/sys:ro and the error is complaining about file in a sub-directory /sys/devices/system/cpu/cpu1/cache. So if that file/folder does not exist in your host vm it won't work inside docker.
I have tested both ubuntu and amazon standard AMI and they seem to have the file mentioned. I don't see debian in the standard AMIs so I have no easy way to test debian but I suspect the image you are using has the required kernel modules or settings missing. Why not use one of the standard Amazon AMIs?

There was a bug we fixed in cAdvisor. The latest version of cAdvisor should work just fine with Debian on AWS or anywhere.

Related

AWS Ubuntu 18.04 AMI package installation failed

Whenever an AWS autoscaling group launches new ubuntu instance and I try to install any package on that instance it gives me the following error:
[stderr]E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
[stderr]E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend),
Is there another process using it?
I tried to find a solution and manually fixed it but I don't know why whenever the autoscaling group launches a new ubuntu instance it gives the following error.
When any command updates the Ubuntu or installs a new application, it locks the dpkg(Debian Package Manager).
To identify the problem, please look at the logs
If your system is installing some updates you may find journalctl logs journalctl -u apt-daily.service. This usually happend when the system is set to update itslef and you will notice such activity with this ps -ef | grep apt.systemd.daily and you can check these setting in the file /etc/apt/apt.conf.d/20auto-upgrades
/var/log/dpkg.log*(as it may get rotated) check these logs to find which all services were trying to get installed
Once you have identified the problem, you can solve with these methods:
If system is updating, then try to wait by executing sleep command in the --user-dataof your bootstrapping script
If your 1st installation of an service/application is blocking other one, then put a condition to wait/sleep until the first service is up and so on with rest of the services you are installing.
This was a common problem in Ubuntu 16.04 LTS as per, and you can find the same with the solution code https://forums.aws.amazon.com/thread.jspa?threadID=251663
A snippet of code from the referenced link:
until service codedeploy-agent status >/dev/null 2>&1; do
sleep 60
rm -f install
wget https://aws-codedeploy-us-west-2.s3.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
service codedeploy-agent restart
done
SSH into the instance before/while the UserData is running and check which process has acquired the lock:
$ lsof /var/lib/dpkg/lock-frontend
Also, try to enable CodeDeploy agent at the last step after performing all other steps in UserData, like:
https://gist.github.com/say8425/8344d19911dba20fab5538b85006bd31

How do I get a podman/buildah container to run under CentOS on GCE?

1. Summarize the problem
I am following this simple tutorial from Developers RedHat to get a simple node/express container working.
I cannot get a container to run under a CentOS 7 VM on GCE.
I have a CentOS 7 GCE virtual machine, where I have Docker installed.
I am able to successfully build and run Docker containers and push them to Google's container registry with no problem.
Now I am trying to build podman/buildah containers, and do the same.
I have buildman/podman installed. When I run this:
podman build -t hello-world-nodejs .
I get the following error message:
cannot clone: Invalid argument user namespaces are not enabled in /proc/sys/user/max_user_namespaces Error: could not get runtime: cannot re-exec process
any ideas?
Additionally, if there are any guides into getting this image into Google's container registry, and running under Cloud Run, it would be greatly appreciated.
Ultimately the destination for some containers is a cloud service.
2. Provide background including what you've already tried
I have tried doing a web search for a solution, nothing found that has solved the problem so far.
3. Show some code
podman build -t hello-world-nodejs .
4. Describe expected and actual results including any error messages
I can create and run docker images/containers on this GCE VM, I am trying to do the same with buildah/podman.
The following solved this issue for me:
sudo bash -c 'echo 10000 > /proc/sys/user/max_user_namespaces'
sudo bash -c "echo $(whoami):110000:65536 > /etc/subuid"
sudo bash -c "echo $(whoami):110000:65536 > /etc/subgid"
And then if you encounter an errors related to lchown run the following:
sudo rm -rf ~/.{config,local/share}/containers /run/user/$(id -u)/{libpod,runc,vfs-*}
I have spun up a CentOS 7 VM on GCE and got same issue. The issue is caused because User Namespaces is not enabled on the kernel by default. You have 2 options, either running podman as root (or using sudo) or enabling User Namespaces in your CentOS VM (the hard way).
According to the post here, the use of user namespace and the allocations of uid and gid’s that are required to make rootless containers work securely in your environment.
Probably StackOverflow is not the best place to ask this question. It's better to ask in the ServerFault site since it's a server and not coding problem.

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

How do I provide credentials to the docker awslogs driver using Docker for Mac?

I'm trying to use the docker awslogs driver and getting the following error:
"docker: Error response from daemon: Failed to initialize logging
driver: NoCredentialProviders: no valid providers in chain.
Deprecated."
According to this GitHub comment, I need to set the AWS_SHARED_CREDENTIALS_FILE environment variable for the docker daemon, but I'm not sure how to do that when using Docker for Mac.
The command I'm using to start the container is:
docker run -d \
--log-driver=awslogs \
--log-opt awslogs-region=us-east-1 \
--log-opt awslogs-group=my-log-group \
my-image
Version information:
Docker for Mac 1.12.1-rc1-beta23 build 11375
OS X El Capitan 10.11.6
but I'm not sure how to do that when using Docker for Mac.
With boot2docker, you would need to modify /var/lib/boot2docker/profile in order to add this variable.
See "Docker daemon config file on boot2docker".
It does persists across the TinyCore-based VM reboot, and the docker daemon would then take it into account.
With the new docker for Mac xhyve-based, the idea should be the same.
/var/lib/boot2docker/profile does exist as well, as shown in this answer.
The official docker dameon doc points to:
--config-file=/etc/docker/daemon.json Daemon configuration file
So try and modify this file.
By default, the comments mention:
~/Library/Containers/com.docker.docker/Data/database/com.doc‌​ker.driver.amd64-lin‌​ux/etc/docker/daemon‌​.json
Using information taken from this answer: Docker deamon config path under mac os
You can connect to the VM layer that runs the docker daemon using:
screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
And you can modify /etc/docker/daemon.json to add the needed variables there.
Once you make your changes, you can just run:
service docker restart
from within the moby terminal to restart the docker daemon.
Do notice that if you restart docker from your mac, the changes will not persist.
On a side note, if you encounter a login screen when connecting with the screen command, try username: root to access the system.

Vagrant Rsync Error before provisioning

So I'm having some adventures with the vagrant-aws plugin, and I'm now stuck on the issue of syncing folders. This is necessary to provision the machines, which is the ultimate goal. However, running vagrant provision on my machine yields
[root#vagrant-puppet-minimal vagrant]# vagrant provision
[default] Rsyncing folder: /home/vagrant/ => /vagrant
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
mkdir -p '/vagrant'
I'm almost positive the error is caused because ssh-ing manually and running that command yields 'permission denied' (obviously, a non-root user is trying to make a directory in the root directory). I tried ssh-ing as root but it seems like bad practice. (and amazon doesn't like it) How can I change the folder to be rsynced with vagrant-aws? I can't seem to find the setting for that. Thanks!
Most likely you are running into the known vagrant-aws issue #72: Failing with EC2 Amazon Linux Images.
Edit 3 (Feb 2014): Vagrant 1.4.0 (released Dec 2013) and later versions now support the boolean configuration parameter config.ssh.pty. Set the parameter to true to force Vagrant to use a PTY for provisioning. Vagrant creator Mitchell Hashimoto points out that you must not set config.ssh.pty on the global config, you must set it on the node config directly.
This new setting should fix the problem, and you shouldn't need the workarounds listed below anymore. (But note that I haven't tested it myself yet.) See Vagrant's CHANGELOG for details -- unfortunately the config.ssh.pty option is not yet documented under SSH Settings in the Vagrant docs.
Edit 2: Bad news. It looks as if even a boothook will not be "faster" to run (to update /etc/sudoers.d/ for !requiretty) than Vagrant is trying to rsync. During my testing today I started seeing sporadic "mkdir -p /vagrant" errors again when running vagrant up --no-provision. So we're back to the previous point where the most reliable fix seems to be a custom AMI image that already includes the applied patch to /etc/sudoers.d.
Edit: Looks like I found a more reliable way to fix the problem. Use a boothook to perform the fix. I manually confirmed that a script passed as a boothook is executed before Vagrant's rsync phase starts. So far it has been working reliably for me, and I don't need to create a custom AMI image.
Extra tip: And if you are relying on cloud-config, too, you can create a Mime Multi Part Archive to combine the boothook and the cloud-config. You can get the latest version of the write-mime-multipart helper script from GitHub.
Usage sketch:
$ cd /tmp
$ wget https://raw.github.com/lovelysystems/cloud-init/master/tools/write-mime-multipart
$ chmod +x write-mime-multipart
$ cat boothook.sh
#!/bin/bash
SUDOERS_FILE=/etc/sudoers.d/999-vagrant-cloud-init-requiretty
echo "Defaults:ec2-user !requiretty" > $SUDOERS_FILE
echo "Defaults:root !requiretty" >> $SUDOERS_FILE
chmod 440 $SUDOERS_FILE
$ cat cloud-config
#cloud-config
packages:
- puppet
- git
- python-boto
$ ./write-mime-multipart boothook.sh cloud-config > combined.txt
You can then pass the contents of 'combined.txt' to aws.user_data, for instance via:
aws.user_data = File.read("/tmp/combined.txt")
Sorry for not mentioning this earlier, but I am literally troubleshooting this right now myself. :)
Original answer (see above for a better approach)
TL;DR: The most reliable fix is to "patch" a stock Amazon Linux AMI image, save it and then use the customized AMI image in your Vagrantfile. See below for details.
Background
A potential workaround is described (and linked in the bug report above) at https://github.com/mitchellh/vagrant-aws/pull/70/files. In a nutshell, add the following to your Vagrantfile:
aws.user_data = "#!/bin/bash\necho 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty && chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty\nyum install -y puppet\n"
Most importantly this will configure the OS to not require a tty for user ec2-user, which seems to be the root of the problem. I /think/ that the additional installation of the puppet package is not required for the actual fix (although Vagrant may use Puppet for provisioning the machine later, depending on how you configured Vagrant).
My experience with the described workaround
I have tried this workaround but Vagrant still occasionally fails with the same error. It might be a "race condition" where Vagrant happens to run its rsync phase faster than cloud-init (which is what aws.user_data is passing information to) can prepare the workaround for #72 on the machine for Vagrant. If Vagrant is faster you will see the same error; if cloud-init is faster it works.
What will work (but requires more effort on your side)
What definitely works is to run the command on a stock Amazon Linux AMI image, and then save the modified image (= create an image snapshot) as a custom AMI image of yours.
# Start an EC2 instance with a stock Amazon Linux AMI image and ssh-connect to it
$ sudo su - root
$ echo 'Defaults:ec2-user !requiretty' > /etc/sudoers.d/999-vagrant-cloud-init-requiretty
$ chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty
# Note: Installing puppet is mentioned in the #72 bug report but I /think/ you do not need it
# to fix the described Vagrant problem.
$ yum install -y puppet
You must then use this custom AMI image in your Vagrantfile instead of the stock Amazon one. The obvious drawback is that you are not using a stock Amazon AMI image anymore -- whether this is a concern for you or not depends on your requirements.
What I tried but didn't work out
For the record: I also tried to pass a cloud-config to aws.user_data that included a bootcmd to set !requiretty in the same way as the embedded shell script above. According to the cloud-init docs bootcmd is run "very early" in the startup cycle for an EC2 instance -- the idea being that bootcmd instructions would be run earlier than Vagrant would try to run its rsync phase. But unfortunately I discovered that the bootcmd feature is not implemented in the outdated cloud-init version of current Amazon's Linux AMIs (e.g. ami-05355a6c has cloud-init 0.5.15-69.amzn1 but bootcmd was only introduced in 0.6.1).