Updating linux-headers-aws breaks ixgbevf - amazon-web-services

When I attempt to update linux-headers-aws on my instance, it becomes unconnectable after restart. Diffing the AWS system log from the console, I found:
ixgbevf: disagrees about version of symbol module_layout
Do I need to reinstall ixgbevf? Should I avoid updating in this manner?
Pre-update:
uname -a
Linux master 4.4.0-1022-aws #31-Ubuntu SMP Tue Jun 27 11:27:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
modinfo ixgbevf
modinfo ixgbevf filename:
/lib/modules/4.4.0-1022-aws/updates/dkms/ixgbevf.ko version:
3.1.2 license: GPL description: Intel(R) 10 Gigabit Virtual Function Network Driver author: Intel Corporation,
srcversion: BA90EAFD4DC7D0F8F47AB8D alias:
pci:v00008086d000015A8svsdbcsci* alias:
pci:v00008086d00001565svsdbcsci* alias:
pci:v00008086d00001515svsdbcsci* alias:
pci:v00008086d000010EDsvsdbcsci* depends: vermagic:
4.4.0-1022-aws SMP mod_unload modversions parm: InterruptThrottleRate:Maximum interrupts per second, per vector,
(956-488281, 0=off, 1=dynamic), default 1 (array of int)
ethtool -i ens3
driver: ixgbevf version: 3.1.2 firmware-version: N/A
expansion-rom-version: bus-info: 0000:00:03.0 supports-statistics: yes
supports-test: yes supports-eeprom-access: no supports-register-dump:
yes supports-priv-flags: no

See this gist
First I created a backup AMI and unheld these packages:
sudo apt-mark unhold linux-aws
sudo apt-mark unhold linux-headers-aws
sudo apt-mark unhold linux-image-aws
sudo apt-mark unhold lxd
sudo apt-mark unhold lxd-client
Then updated:
sudo apt-get install linux-headers-aws linux-image-aws
This ran successfully:
Setting up linux-headers-aws (4.4.0.1026.29) ...
Setting up linux-aws (4.4.0.1026.29) ...
Several attempts to reboot at this point rendered my AMI unconnectable with the error message in the system log documented in my question. So I re-downloaded ixgbevf according to the AWS documentation and re-installed, commenting out the version check line to allow compilation (#if UTS_UBUNTU_RELEASE_ABI > 255). This required that I first uninstall the existing ixgbevf-3.1.2 module:
sudo dkms remove ixgbevf/3.1.2 --all
sudo dkms add -m ixgbevf -v 3.1.2
sudo dkms build -m ixgbevf -v 3.1.2
sudo dkms install -m ixgbevf -v 3.1.2 --all
sudo update-initramfs -c -k all
sudo reboot
And I was then able to successfully connect.

Related

Unable to upgrade Amazon Linux 2 kernel from 4.14 to 5.4

I have an older Amazon Linux 2 EC2 instance running the following:
uname -sar:
Linux ip-172-31-8-8.eu-west-1.compute.internal 4.14.62-70.117.amzn2.x86_64 #1 SMP Fri Aug 10 20:14:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Trying to upgrade the kernel to 5.4 using sudo amazon-linux-extras install kernel-5.4 -y shows the following error:
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:5.4.228-132.418.amzn2 will be installed
--> Processing Dependency: microcode_ctl >= 2:2.1-47.amzn2.0.4 for package: kernel-5.4.228-132.418.amzn2.x86_64
--> Finished Dependency Resolution
Error: Package: kernel-5.4.228-132.418.amzn2.x86_64 (amzn2extra-kernel-5.4)
Requires: microcode_ctl >= 2:2.1-47.amzn2.0.4
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Installation failed. Check that you have permissions to install.
Any ideas on how to resolve this dependency and why is it causing it?

Unable to install syslog-ng on amazon linux 2

I have started EC2 instance from L=amazon linux 2 AMI.
I am trying to install syslog-ng with yum but I am getting error.
Commands used :
$ sudo amazon-linux-extras install epel -y
$ sudo yum install syslog-ng
AND
$ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/czanik/syslog-ng321/repo/epel-7/czanik-syslog-ng321-epel-7.repo"
$ sudo yum install --enablerepo=epel --assumeyes syslog-ng
But I am getting following error in both the cases:
Loaded plugins: dkms-build-requires, extras_suggestions, langpacks, priorities, update-motd
215 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package syslog-ng.x86_64 0:3.23.1-1.el6 will be installed
--> Processing Dependency: libmaxminddb.so.0()(64bit) for package: syslog-ng-3.23.1-1.el6.x86_64
--> Processing Dependency: libpcre.so.0()(64bit) for package: syslog-ng-3.23.1-1.el6.x86_64
--> Running transaction check
---> Package libmaxminddb.x86_64 0:1.2.0-1.el7 will be installed
---> Package syslog-ng.x86_64 0:3.23.1-1.el6 will be installed
--> Processing Dependency: libpcre.so.0()(64bit) for package: syslog-ng-3.23.1-1.el6.x86_64
--> Finished Dependency Resolution
Error: Package: syslog-ng-3.23.1-1.el6.x86_64 (copr:copr.fedorainfracloud.org:czanik:syslog-ng323epel6)
Requires: libpcre.so.0()(64bit)
You could try using --skip-broken to work around the problem
** Found 1 pre-existing rpmdb problem(s), 'yum check' output follows:
cloud-init-19.3-44.amzn2.noarch has missing requires of rsyslog
I wrote these instructions a year ago: https://www.syslog-ng.com/community/b/blog/posts/installing-syslog-ng-in-amazon-linux-2-including-graviton2
I do not have any AWS accounts right now, but if they do not work, the I'll try to get one...
**Update:**
First of all: I double checked your report. You mention adding a repo for EPEL 7 syslog-ng 3.21, however the error below it is about syslog-ng 3.23 for EPEL 6.
OK, I got access. I followed my own instructions and it works:
[ec2-user#ip-xxx ~]$ syslog-ng -V
syslog-ng 3 (3.29.1)
Config version: 3.29
Installer-Version: 3.29.1
Revision:
Compile-Date: Aug 29 2020 08:27:16
Module-Directory: /usr/lib64/syslog-ng
Module-Path: /usr/lib64/syslog-ng
Include-Path: /usr/share/syslog-ng/include
Available-Modules: add-contextual-data,affile,afprog,afsocket,afstomp,afuser,appmodel,basicfuncs,cef,confgen,cryptofuncs,csvparser,dbparser,disk-buffer,examples,graphite,hook-commands,json-plugin,kvformat,linux-kmsg-format,map-value-pairs,pseudofile,sdjournal,stardate,syslogformat,system-source,tags-parser,tfgetent,timestamp,xml,azure-auth-header,http
Enable-Debug: off
Enable-GProf: off
Enable-Memtrace: off
Enable-IPv6: on
Enable-Spoof-Source: on
Enable-TCP-Wrapper: on
Enable-Linux-Caps: on
Enable-Systemd: on
[ec2-user#ip-xxx ~]$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
And tried to latest version (3.35), and that works as well.
Tried with specific package and its able to install.
$ sudo yum --enablerepo=epel -y install syslog-ng-3.5.6-3.el7.x86_64

docker image is different when running from different host

During building of a 3rd party library (libtorch, if it matters) in a docker container, I came across an error of a missing include file.
The same process of building worked fine when running the build process from Ubuntu 16.04 host, but when running from an Ubuntu 18.04 host, the file was missing.
After a bit of trace back, I'm now just running the base container from NVidia, and looking for the file.
This is the outputs I get:
Ubuntu 16.04 host:
$ uname -a
Linux ub-carmel 4.15.0-123-generic #126~16.04.1-Ubuntu SMP Wed Oct 21 13:48:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ docker pull nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Image is up to date for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root#2ecc17248fab:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
-rw-r--r-- 1 root root 7817 Dec 4 2019 ia32intrin.h
Ubuntu 18.04 host:
$ uname -a
Linux ub-carmel-18-04 5.4.0-56-generic #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.14, build 5eb3275d40
$ docker pull nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Downloaded newer image for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root#89f771e82a51:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
root#89f771e82a51:/#
As you can see, the sha256 digest of the images is the same (and matches the digest from NVidia's NGC here)
At first I thought that maybe in some hidden way the includes come from the host, but the ia32intrin.h file exists in both hosts
What can cause such issue?
EDIT
Added the docker --version outputs for each host. There's a difference, but I doubt this should cause such issues
EDIT 2
Added the output for uname -a
EDIT 3
Output of docker version:
Ubuntu 16:
$ docker version
Client: Docker Engine - Community
Version: 19.03.13
API version: 1.40
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:02:59 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.13
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:01:30 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.7
GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Ubuntu 18:
$ docker version
Client: Docker Engine - Community
Version: 19.03.14
API version: 1.40
Go version: go1.13.15
Git commit: 5eb3275d40
Built: Tue Dec 1 19:20:17 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.14
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 5eb3275d40
Built: Tue Dec 1 19:18:45 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.9
GitCommit: ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
So I tested it on a different Ubuntu machines (EC2 instances) and in that case, for both 18.04 & 16.04 the file exists. so looks like it's a problem on my machine.
Any thoughts of what can cause this?
Best guess is that the pulled layers on the Ubuntu 18.04 host are somehow corrupt. The nuclear option to clean that up is to reset docker. This will delete all images, volumes, containers, logs, networks, everything, so backup anything you want to keep before running this:
sudo -s # these commands need root
systemctl stop docker
rm -rf /var/lib/docker
systemctl start docker
exit # exit sudo

How to install docker on Amazon Linux2

I wanna create docker image for Amazon ECR.
but yum can't find it in my Amazon Linux2.
[root#*** ~]# yum install -y docker
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
No package docker available.
Error: Nothing to do
Next, I tried to use amazon-linux-extras.
but amazon-linux-extras is not found, too.
[root#*** ~]# amazon-linux-extras install docker -y
-bash: amazon-linux-extras: command not found
[root#*** ~]# find / -name 'amazon-linux-extras'
[root#*** ~]$ cat /proc/version
Linux version 4.14.77-81.59.amzn2.x86_64 (mockbuild#ip-10-0-1-59) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Mon Nov 12 21:32:48 UTC 2018
How can I install amazon-linux-extras or create docker image?
Install Docker
sudo yum update -y
sudo yum -y install docker
Start Docker
sudo service docker start
Access Docker commands in ec2-user user
sudo usermod -a -G docker ec2-user
sudo chmod 666 /var/run/docker.sock
docker version
So sorry, it was my misunderstanding.
My OS is Redhat Linux.
I get to install docker by
yum-config-manager --enable rhui-REGION-rhel-server-extras
yum -y install docker
systemctl start docker
systemctl enable docker
docker version
Make sure you have amazon-linux-extras installed
[root#ip-20-0-0-112 ~]# which amazon-linux-extras
/usr/bin/amazon-linux-extras
If not install amazon-linux-extras using yum
yum -y install amazon-linux-extras
Then install docker using
amazon-linux-extras install docker
I'm on amazon linux (RHEL 7.2) - ami-035b3c7efe6d061d5,
cat /proc/version
Linux version 4.14.123-86.109.amzn1.x86_64 (mockbuild#koji-pdx-corp-builder-64004) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Mon Jun 10 19:44:53 UTC 2019
Following script works without having to install amazon-linux-extras install
sudo yum -y install docker
sudo service docker start
sudo chmod 666 /var/run/docker.sock
I had to fix permission issue. Also described here How to fix docker: Got permission denied issue
Then I can check containers,
[ec2-user#ip-30-0-0-196 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
For the Amazon Linux AMI, access to the Extra Packages for Enterprise Linux (EPEL) repository is configured, but it is not enabled by default.
To install amazon-linux-extras, verify connection to the internet from within the instance then check the instance's OS:
cat /etc/os-release
If the OS is amazon linux version 2 run
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Or run sudo yum-config-manager --enable epel
To use the EPEL repository. You can now install available packages... e.g. sudo amazon-linux-extras install docker
see aws documentation for more details.
You can use the below script to install docker inside Amazon Linux 2 Instance. Also, you can mention the below script in the ec2 user data section so at the time of server bootstrap docker will be installed automatically.
#!/bin/bash
sudo yum update -y
sudo yum -y install docker
sudo service docker start
sudo usermod -a -G docker ec2-user
sudo chmod 666 /var/run/docker.sock
Amazon Linux 2 comes with amazon-linux-extras installed. If you think that you are running Amazon Linux 2, and amazon-linux-extras is not on the path of the ec2-user, you might be running an older version of Amazon Linux. Run this command:
grep PRETTY_NAME /etc/os-release
It should output
PRETTY_NAME="Amazon Linux 2"
If you don't see that, go back to the ec2 console and drill-down into the details of the instance. Clicking on the AMI should reveal that it corresponds to an older version of Amazon Linux. Some AWS facilities, notably CDK, currently default to Amazon Linux instead of Amazon Linux 2 when creating new instances.
I came across this question when trying to set up a docker image, based on Amazon Linux 2.
What I didn't find in the current answers is that the docker needs to be enabled in amazon-linux-extras before installing.
Dockerfile commands that worked for me:
RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras enable docker
RUN yum install -y docker

Installing VirtualBox 5.1 on CentOS 7.3

Has anyone been able to install VirtualBox 5.1 successfully on a CentOS 7.3 x64 box? Installing it via YUM succeeds, but calling "vagrant -v" shows the following:
This system is not currently set up to build kernel modules (system extensions).
Running the following commands should set the system up correctly:
yum install kernel-devel-3.10.0-327.36.3.el7.x86_64
(The last command may fail if your system is not fully updated.)
yum install kernel-devel
kernel-devel is installed already as part of dependencies. So it seems VirtualBox expects the 7.2 kernel modules. Has anyone been able to install VirtualBox 5.1 on kernel 3.10.0-514.2.2.el7.x86_64?
I found a solution to this, in case it might be useful for someone else.
(1) Visit https://www.rpmfind.net/linux/RPM/centos/updates/7.2.1511/x86_64/Packages/kernel-devel-3.10.0-327.36.3.el7.x86_64.html
(2) Download the RPM (kernel-devel-3.10.0-327.36.3.el7.x86_64.rpm)
(3) Run yum localinstall -y /path/to/kernel-devel-3.10.0-327.36.3.el7.x86_64.rpm to install "kernel-devel".
(4) Run /sbin/vboxconfig to further configure VirtualBox.
If you are using Ansible, you need something like the below before installing it via YUM (only if you have a system that does not have the correct kernel sources).
# Required kernel module
- name: Copy required kernel modules
copy:
src: "{{ role_path }}/files/{{ vbox_kernel_devel_rpm }}"
dest: "/tmp/{{ vbox_kernel_devel_rpm }}"
- name: Install kernel-devel module
shell: "yum localinstall -y /tmp/{{ vbox_kernel_devel_rpm }}"
args:
warn: false
- name: Delete uploaded RPM
file: path="/tmp/{{ vbox_kernel_devel_rpm }}", state=absent
I had machines where the kernel version was different so I found it easier to do this in scripts with:
curl -s ftp://fr2.rpmfind.net/linux/centos/7.2.1511/updates/x86_64/Packages/kernel-devel-$(uname -r).rpm -o kernel-devel-$(uname -r).rpm
sudo yum localinstall kernel-devel-$(uname -r).rpm
sudo yum install docker-engine VirtualBox-5.1 kernel-headers gcc
sudo /sbin/vboxconfig
This could be customised further with CentOS version in the url.