During building of a 3rd party library (libtorch, if it matters) in a docker container, I came across an error of a missing include file.
The same process of building worked fine when running the build process from Ubuntu 16.04 host, but when running from an Ubuntu 18.04 host, the file was missing.
After a bit of trace back, I'm now just running the base container from NVidia, and looking for the file.
This is the outputs I get:
Ubuntu 16.04 host:
$ uname -a
Linux ub-carmel 4.15.0-123-generic #126~16.04.1-Ubuntu SMP Wed Oct 21 13:48:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ docker pull nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Image is up to date for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root#2ecc17248fab:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
-rw-r--r-- 1 root root 7817 Dec 4 2019 ia32intrin.h
Ubuntu 18.04 host:
$ uname -a
Linux ub-carmel-18-04 5.4.0-56-generic #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.14, build 5eb3275d40
$ docker pull nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Downloaded newer image for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root#89f771e82a51:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
root#89f771e82a51:/#
As you can see, the sha256 digest of the images is the same (and matches the digest from NVidia's NGC here)
At first I thought that maybe in some hidden way the includes come from the host, but the ia32intrin.h file exists in both hosts
What can cause such issue?
EDIT
Added the docker --version outputs for each host. There's a difference, but I doubt this should cause such issues
EDIT 2
Added the output for uname -a
EDIT 3
Output of docker version:
Ubuntu 16:
$ docker version
Client: Docker Engine - Community
Version: 19.03.13
API version: 1.40
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:02:59 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.13
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:01:30 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.7
GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Ubuntu 18:
$ docker version
Client: Docker Engine - Community
Version: 19.03.14
API version: 1.40
Go version: go1.13.15
Git commit: 5eb3275d40
Built: Tue Dec 1 19:20:17 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.14
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 5eb3275d40
Built: Tue Dec 1 19:18:45 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.9
GitCommit: ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
So I tested it on a different Ubuntu machines (EC2 instances) and in that case, for both 18.04 & 16.04 the file exists. so looks like it's a problem on my machine.
Any thoughts of what can cause this?
Best guess is that the pulled layers on the Ubuntu 18.04 host are somehow corrupt. The nuclear option to clean that up is to reset docker. This will delete all images, volumes, containers, logs, networks, everything, so backup anything you want to keep before running this:
sudo -s # these commands need root
systemctl stop docker
rm -rf /var/lib/docker
systemctl start docker
exit # exit sudo
Related
Docker-compose seems to have stopped working on Sagemaker Notebook instances. When running docker-compose up I encounter the following error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/bin/docker-compose", line 8, in <module>
sys.exit(main())
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/main.py", line 81, in main
command_func()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/main.py", line 200, in perform_command
project = project_from_options('.', options)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/command.py", line 70, in project_from_options
enabled_profiles=get_profiles_from_options(options, environment)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/command.py", line 153, in get_project
verbose=verbose, version=api_version, context=context, environment=environment
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/docker_client.py", line 43, in get_client
environment=environment, tls_version=get_tls_version(environment)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/compose/cli/docker_client.py", line 170, in docker_client
client = APIClient(use_ssh_client=not use_paramiko_ssh, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/docker/api/client.py", line 197, in __init__
self._version = self._retrieve_server_version()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/docker/api/client.py", line 222, in _retrieve_server_version
'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: Timeout value connect was Timeout(connect=60, read=60, total=None), but it must be an int, float or None
I can start Docker containers as usual.
sh-4.2$ docker version
Client:
Version: 20.10.7
API version: 1.41
Go version: go1.15.14
Git commit: f0df350
Built: Tue Sep 28 19:55:40 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.15.14
Git commit: b0f5bc3
Built: Tue Sep 28 19:57:35 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0
GitCommit: %runc_commit
docker-init:
Version: 0.19.0
GitCommit: de40ad0
But docker-compose wouldn't work...
sh-4.2$ docker-compose version
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.0
CPython version: 3.6.13
OpenSSL version: OpenSSL 1.1.1l 24 Aug 2021
For those of you who (might) have encountered the same issue, here's the fix:
1). Install the newest version of docker-compose:
sh-4.2$ sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sh-4.2$ sudo chmod +x /usr/local/bin/docker-compose
2). Change your PATH accordingly (since docker-compose is installed using conda and is picked up first) or use /usr/local/bin/docker-compose onwards:
sh-4.2$ PATH=/usr/local/bin:$PATH
sh-4.2$ docker-compose version
docker-compose version 1.29.2, build 5becea4c
docker-py version: 5.0.0
CPython version: 3.7.10
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
Perhaps, the issue is related to this:
On August 9, 2021 the Jupyter Notebook and Jupyter Lab open source software projects announced 2 security concerns that could impact Amazon Sagemaker Notebook Instance customers.
Sagemaker has deployed updates to address these concerns, and we recommend customers with existing notebook sessions to stop and restart their notebook instance(s) to benefit from these updates. Notebook instances launched after August 10, 2021, when updates were deployed, are not impacted by this issue and do not need to be restarted.
I'm trying to follow this tutorial on AWS ECS integration that mentions the Docker command docker compose convert that is supposed to generate a AWS CloudFormation template.
However, when I run this command, it doesn't appear to exist.
$ docker-compose convert
No such command: convert
#...
$ docker compose convert
docker: 'compose' is not a docker command.
See 'docker --help'
$ docker context create ecs myecscontext
"docker context create" requires exactly 1 argument.
See 'docker context create --help'.
Usage: docker context create [OPTIONS] CONTEXT
Create a context
$ docker --version
Docker version 19.03.13, build 4484c46
$ docker-compose --version
docker-compose version 1.25.5, build unknown
$ docker version
Client:
Version: 19.03.13
API version: 1.40
Go version: go1.13.8
Git commit: 4484c46
Built: Thu Oct 15 18:34:11 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.11
API version: 1.40 (minimum version 1.12)
Go version: go1.13.12
Git commit: 77e06fd
Built: Mon Jun 8 20:24:59 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit:
docker-init:
Version: 0.18.0
GitCommit: fec3683
$ docker info
Client:
Debug Mode: false
Server:
Containers: 12
Running: 3
Paused: 0
Stopped: 9
Images: 149
Server Version: 19.03.11
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version:
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.8.0-29-generic
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 7.202GiB
Name: HongLee
ID: GZ5R:KQDD:JHOJ:KCUF:73AE:N3NY:MWXS:ABQ2:2EVY:4ABJ:H375:J64V
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Any ideas?
To get the ECS integration, you need to be using an ECS docker context. First, enable the experimental flag in /etc/docker/daemon.json
// /etc/docker/daemon.json
{
"experimental": true
}
Then create the context:
docker context create ecs myecscontext
docker context use myecscontext
$ docker context ls
NAME TYPE DESCRIPTION DOCKER ENDPOINT KUBERNETES ENDPOINT ORCHESTRATOR
default moby Current DOCKER_HOST based configuration unix:///var/run/docker.sock [redacted] (default) swarm
myecscontext * ecs
Now run convert:
$ docker compose convert
WARN[0000] services.build: unsupported attribute
AWSTemplateFormatVersion: 2010-09-09
Resources:
AdminwebService:
DependsOn:
- AdminwebTCP80Listener
Properties:
Cluster:
...
You're running on Ubuntu. The /usr/bin/docker installed (even with latest docker-ce 20.10.6) does not enable the docker compose subcommand. It is enabled by default on Docker for Desktop Windows or Mac.
See the Linux installation instructions at https://github.com/docker/compose-cli to download and configure so that docker compose works.
There's a curl|bash script for Ubuntu or just download the latest release, put that docker executable into a PATH directory before /usr/bin/ and make sure the original docker is available as com.docker.cli e.g. ln -s /usr/bin/docker ~/bin/com.docker.cli.
(i’m using centOS 7)
I’m following this tutorial, I’m at the part Download ShinyProxy and i download the rpm file. I install it with this command
$ sudo yum localinstall shinyproxy_2.3.0_x86_64.rpm
The folder /etc/shinyproxy is empty.
My question is where is the files shinyproxy-2.3.0.jar, application.yml ?
The application.yml that shinyproxy runs on will be located in /etc/shinyproxy/application.yml.
System info:
> Linux my_vm 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST
> 2016 x86_64 x86_64 x86_64 GNU/Linux
Make sure to run systemctl stop shinyproxy, followed by systemctl start shinyproxy to implement changes.
I'm getting this error when I tried to docker-compose build my docker-compose.yml file:
In file './docker-compose.yml' service 'version' doesn't have any configuration options. All top level keys in your docker-compose.yml must map to a dictionary of configuration options.
docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: 78d1802
Built: Tue Jan 31 23:47:34 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.3
Git commit: 78d1802
Built: Tue Jan 31 23:47:34 2017
OS/Arch: linux/amd64
docker --version
Docker version 1.12.6, build 78d1802
docker-compose --version
docker-compose version 1.5.2, build unknown
is this because the build unknown?
docker-composer.yml
version: "2"
services:
postgres:
image: postgres:9.6
volumes:
- pgdata:/var/lib/data/postgres
backend:
build: .
command: gosu app bash
volumes:
- .:/app
- pyenv:/python
links:
- postgres:postgres
ports:
- 8000:8000
volumes:
pyenv:
pgdata:
Try upgrading the docker-compose version. Version 2 files are supported by Compose 1.6.0+ and require a Docker Engine of version 1.10.0+.
Install latest "docker-compose" -
$ sudo curl -o /usr/local/bin/docker-compose -L "https://github.com/docker/compose/releases/download/1.15.0/docker-compose-$(uname -s)-$(uname -m)"
$ sudo chmod +x /usr/local/bin/docker-compose
Ref-
https://docs.docker.com/compose/compose-file/compose-versioning/#version-2
You should install docker-compose using the official documentation https://docs.docker.com/compose/install/
If you are using linux, I have found the apt install for docker-compose shows some weird behavior. So uninstall docker-compose and reinstall it using the official documentation above.
sudo apt-get purge docker-compose
When I attempt to update linux-headers-aws on my instance, it becomes unconnectable after restart. Diffing the AWS system log from the console, I found:
ixgbevf: disagrees about version of symbol module_layout
Do I need to reinstall ixgbevf? Should I avoid updating in this manner?
Pre-update:
uname -a
Linux master 4.4.0-1022-aws #31-Ubuntu SMP Tue Jun 27 11:27:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
modinfo ixgbevf
modinfo ixgbevf filename:
/lib/modules/4.4.0-1022-aws/updates/dkms/ixgbevf.ko version:
3.1.2 license: GPL description: Intel(R) 10 Gigabit Virtual Function Network Driver author: Intel Corporation,
srcversion: BA90EAFD4DC7D0F8F47AB8D alias:
pci:v00008086d000015A8svsdbcsci* alias:
pci:v00008086d00001565svsdbcsci* alias:
pci:v00008086d00001515svsdbcsci* alias:
pci:v00008086d000010EDsvsdbcsci* depends: vermagic:
4.4.0-1022-aws SMP mod_unload modversions parm: InterruptThrottleRate:Maximum interrupts per second, per vector,
(956-488281, 0=off, 1=dynamic), default 1 (array of int)
ethtool -i ens3
driver: ixgbevf version: 3.1.2 firmware-version: N/A
expansion-rom-version: bus-info: 0000:00:03.0 supports-statistics: yes
supports-test: yes supports-eeprom-access: no supports-register-dump:
yes supports-priv-flags: no
See this gist
First I created a backup AMI and unheld these packages:
sudo apt-mark unhold linux-aws
sudo apt-mark unhold linux-headers-aws
sudo apt-mark unhold linux-image-aws
sudo apt-mark unhold lxd
sudo apt-mark unhold lxd-client
Then updated:
sudo apt-get install linux-headers-aws linux-image-aws
This ran successfully:
Setting up linux-headers-aws (4.4.0.1026.29) ...
Setting up linux-aws (4.4.0.1026.29) ...
Several attempts to reboot at this point rendered my AMI unconnectable with the error message in the system log documented in my question. So I re-downloaded ixgbevf according to the AWS documentation and re-installed, commenting out the version check line to allow compilation (#if UTS_UBUNTU_RELEASE_ABI > 255). This required that I first uninstall the existing ixgbevf-3.1.2 module:
sudo dkms remove ixgbevf/3.1.2 --all
sudo dkms add -m ixgbevf -v 3.1.2
sudo dkms build -m ixgbevf -v 3.1.2
sudo dkms install -m ixgbevf -v 3.1.2 --all
sudo update-initramfs -c -k all
sudo reboot
And I was then able to successfully connect.