During deployment of new app version on new EC2 with template from AMI image I got following error:
[ 1.670047] No filesystem could mount root, tried:
[ 1.670048]
[ 1.677170] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 1.685026] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.14.181-140.257.amzn2.x86_64 #1
[ 1.692532] Hardware name: Amazon EC2 t3a.nano/, BIOS 1.0 10/16/2017
previously it was working like charm - I haven't change any AMI or template setting - just added new SSL certs to nginx
I found this troubleshooting article but cannot move with it since I cannot access instance terminal.
I've tried to boo new instance with add different kernel ID but got this error:
You cannot launch multiple AMIs with different virtualization styles at the same time.
I believed during AMI kernel has somehow changed and it causes problem - is it possible?
What can cause that issue and how to solve?
Related
I am running an OKD 4.5 cluster with 3 master nodes on AWS, installed using openshift-install.
In attempting to update the cluster to 4.5.0-0.okd-2020-09-04-180756 I have run into numerous issues.
The current issue is the console and apiserver pods on one master server are in crashLoopBackoff do to what appears to be an internal networking issue.
The logs of the apiserver pod are as follows:
Copying system trust bundle
I0911 15:59:15.763716 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key"
F0911 15:59:19.556715 1 cmd.go:72] unable to load configmap based request-header-client-ca-file: > Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: connect: no route to host
I have tried deleting the pods, and the new ones crashLoop as well.
update
removed the troubled mater, added a new machine to build a new master. apiserver and console are no longer failing, but now etcd is.
#### attempt 9
member={name="ip-172-99-6-251.ec2.internal", peerURLs=[https://172.99.6.251:2380}, clientURLs=[https://172.99.6.251:2379]
member={name="ip-172-99-6-200.ec2.internal", peerURLs=[https://172.99.6.200:2380}, clientURLs=[https://172.99.6.200:2379]
member={name="ip-172-99-6-249.ec2.internal", peerURLs=[https://172.99.6.249:2380}, clientURLs=[https://172.99.6.249:2379]
target=nil, err=<nil>
#### sleeping...
*note 172.99.6.251 is the ip of the master node this one replaced
I have tried two methods to upload openwrt x86_64 image to AWS AMI and run on EC2, but both failed.
The image I built runs ok on VirutalBox and vmware.
The first method - vm_import/export.
I followed instruction on https://amazonaws-china.com/cn/ec2/vm-import/, vm_import tool failed and said "Not found initrd in Grub" at last.
Openwrt doesn't use initrd at boot stage. This is the default boot entry of grub.cfg
menuentry "OpenWrt" {
linux /boot/vmlinuz root=PARTUUID=fbdad417-02 rootfstype=ext4 rootwait console=tty0 console=ttyS0,115200n8 noinitrd
}
The second method - ec2-bundle-image/ec2-upgrade-image
I tried this way, and it can upload image files and metadata files to S3, and I could make a new AMI, and launch EC2 instance. But EC2 instance was not be booted correctly it stop at the grubdom>.
I followed the instruction of https://forum.archive.openwrt.org/viewtopic.php?id=41588, it seems a little old, I didn't found the aki instance it mentioned and used a alternative one (aki-7077ab11 pv-grub-hd0_1.05-x86_64.gz).
Whatever the combined image(openwrt default built) or the custom image(release rootfs.tar.gz and copy kernel and grub config to it), both failed, here is EC2 instance system log:
Xen Minimal OS!
start_info: 0x10d4000(VA)
nr_pages: 0xe504a
shared_inf: 0xeeb28000(MA)
pt_base: 0x10d7000(VA)
nr_pt_frames: 0xd
mfn_list: 0x9ab000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x300
cmd_line: root=/dev/sda1 ro console=hvc0 4
stack: 0x96a100-0x98a100
MM: Init
_text: 0x0(VA)
_etext: 0x7b824(VA)
_erodata: 0x97000(VA)
_edata: 0x9cce0(VA)
stack start: 0x96a100(VA)
_end: 0x9aa700(VA)
start_pfn: 10e7
max_pfn: e504a
Mapping memory range 0x1400000 - 0xe504a000
setting 0x0-0x97000 readonly
skipped 0x1000
MM: Initialise page allocator for 1809000(1809000)-e504a000(e504a000)
MM: done
Demand map pfns at e504b000-20e504b000.
Heap resides at 20e504c000-40e504c000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0xe504b000.
Initialising scheduler
Thread "Idle": pointer: 0x20e504c050, stack: 0x1f10000
Thread "xenstore": pointer: 0x20e504c800, stack: 0x1f20000
xenbus initialised on irq 3 mfn 0xfeffc
Thread "shutdown": pointer: 0x20e504cfb0, stack: 0x1f30000
Dummy main: start_info=0x98a200
Thread "main": pointer: 0x20e504d760, stack: 0x1f40000
"main" "root=/dev/sda1" "ro" "console=hvc0" "4"
vbd 2049 is hd0
******************* BLKFRONT for device/vbd/2049 **********
backend at /local/domain/0/backend/vbd/27482/2049
2097152 sectors of 512 bytes
**************************
vbd 2064 is hd1
******************* BLKFRONT for device/vbd/2064 **********
backend at /local/domain/0/backend/vbd/27482/2064
8377344 sectors of 512 bytes
**************************
[H[J
GNU GRUB version 0.97 (3752232K lower / 0K upper memory)
[ Minimal BASH-like line editing is supported. For
the first word, TAB lists possible command
completions. Anywhere else TAB lists the possible
completions of a device/filename. ]
grubdom>
Any idea? thanks.
It is easy task which doesn't need any of the complicated setup.
I used Virtualbox, but any other virtualization can be used (e.g. VMware or Hyper-V)
By my experience, placing openwrt to AWS fails using any of import methods other than "importing snapshot"
download openwrt
https://downloads.openwrt.org/releases/19.07.5/targets/x86/64/
install openwrt on virtualbox and create ova
https://openwrt.org/docs/guide-user/virtualization/virtualbox-vm
2a) convert img to vdi
- example: VBoxManage convertfromraw --format VDI openwrt-x86-64-combined.img openwrt.vdi
2b) extend vdi to 1GB
- example: VBoxManage modifymedium openwrt.vdi --resize 1024
2c) boot openwrt
2d) change eth0 interface to dhcp
- example: vi /etc/config/network
2e) shutdown
2f) export VM to ova'
rename .ova to .zip
unzip .zip
by unzipping you get vmdk file of virtual disk
upload vmdk to AWS S3 bucket
add vmimport role to your account
https://www.msp360.com/resources/blog/how-to-configure-vmimport-role/
import vmdk as snapshot
https://docs.aws.amazon.com/vm-import/latest/userguide/vmimport-import-snapshot.html
create new EC2 instance
replace EC2 instance volume with imported volume
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-restoring-volume.html
boot up
I am trying to limit iops on a particular container in my docker-compose stack. To do this I am using the following config:
blkio_config:
device_write_iops:
- path: "/dev/xvda1"
rate: 20
device_read_iops:
- path: "/dev/xvda1"
rate: 20
I cannot provide the rest of the file for security reasons however it is isolated to this statement. I confirmed that this is the correct path for my ebs volume using the df -h command.
When I then run docker-compose up -d I get the following error:
Recreating e1c25c41b612_drone ... error
ERROR: for e1c25c41b612_drone Cannot start service drone: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:396: setting cgroup config for procHooks process caused \\\"failed to write 202:1 20 to blkio.throttle.read_iops_device: write /sys/fs/cgroup/blkio/docker/a674e86d50111afa576d5fd4e16a131070c100b7db3ac22f95986904a47ae82a/blkio.throttle.read_iops_device: invalid argument\\\"\"": unknown
ERROR: for drone Cannot start service drone: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:396: setting cgroup config for procHooks process caused \\\"failed to write 202:1 20 to blkio.throttle.read_iops_device: write /sys/fs/cgroup/blkio/docker/a674e86d50111afa576d5fd4e16a131070c100b7db3ac22f95986904a47ae82a/blkio.throttle.read_iops_device: invalid argument\\\"\"": unknown
The iops limit on my EBS instance is 120 and so I tested using a variety of different values to no avail.
Any help is massively appreciated.
Has anyone facing this issue with docker pull. we recently upgraded docker to 18.03.1-ce from then we are seeing the issue. Although we are not exactly sure if this is related to docker, but just want to know if anyone faced this problem.
We have done some troubleshooting using tcp dump the DNS queries being made were under the permissible limit of 1024 packet. which is a limit on EC2, We also tried working around the issue by modifying the /etc/resolv.conf file to use a higher retry \ timeout value, but that didn't seem to help.
we did a packet capture line by line and found something. we found some responses to be negative. If you use Wireshark, you can use 'udp.stream eq 12' as a filter to view one of the negative answers. we can see the resolver sending an answer "No such name". All these requests that get a negative response use the following name in the request:
354XXXXX.dkr.ecr.us-east-1.amazonaws.com.ec2.internal
Would anyone of you happen to know why ec2.internal is being adding to the end of the DNS? If run a dig against this name it fails. So it appears that a wrong name is being sent to the server which responds with 'no such host'. Is docker is sending a wrong dns name for resolution.
We see this issue happening intermittently. looking forward for help. Thanks in advance.
Expected behaviour
5.0.25_61: Pulling from rrg
Digest: sha256:50bbce4af6749e9a976f0533c3b50a0badb54855b73d8a3743473f1487fd223e
Status: Downloaded newer image forXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Actual behaviour
docker-compose up -d rrg-node-1
Creating rrg-node-1
ERROR: for rrg-node-1 Cannot create container for service rrg-node-1: Error response from daemon: Get https:/XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/v2/: dial tcp: lookup XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com on 10.5.0.2:53: no such host
Steps to reproduce the issue
docker pull XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Output of docker version:
(Docker version 18.03.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3)
Output of docker info:
([ec2-user#ip-10-5-3-45 ~]$ docker info
Containers: 37
Running: 36
Paused: 0
Stopped: 1
Images: 60
Server Version: swarm/1.2.5
Role: replica
Primary: 10.5.4.172:3375
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 12
Plugins:
Volume:
Network:
Log:
Swarm:
NodeID:
Is Manager: false
Node Address:
Kernel Version: 4.14.51-60.38.amzn1.x86_64
Operating System: linux
Architecture: amd64
CPUs: 22
Total Memory: 80.85GiB
Name: mgr1
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Live Restore Enabled: false
WARNING: No kernel memory limit support)
The command vagrant up is failing and I don't know why.
$ egrep -v '^ *(#|$)' Vagrantfile
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "precise32"
end
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
[default] Importing base box 'precise32'...
[default] Matching MAC address for NAT networking...
[default] Setting the name of the VM...
[default] Clearing any previously set forwarded ports...
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Preparing network interfaces based on configuration...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
The VM failed to remain in the "running" state while attempting to boot.
This is normally caused by a misconfiguration or host system incompatibilities.
Please open the VirtualBox GUI and attempt to boot the virtual machine
manually to get a more informative error message.
$ vagrant status
Current machine states:
default poweroff (virtualbox)
The VM is powered off. To restart the VM, simply run `vagrant up`
$ VBoxManage list runningvms
$
Here are the messages in the VirtualBox log file, VBoxSVC.log:
$ cat ~/.VirtualBox/VBoxSVC.log
VirtualBox XPCOM Server 4.2.16 r86992 linux.amd64 (Jul 4 2013 16:29:59) release log
00:00:00.000499 main Log opened 2013-08-13T18:40:45.907580000Z
00:00:00.000508 main OS Product: Linux
00:00:00.000509 main OS Release: 3.6.11-4.fc16.x86_64
00:00:00.000510 main OS Version: #1 SMP Tue Jan 8 20:57:42 UTC 2013
00:00:00.000537 main DMI Product Name: X8DA3
00:00:00.000547 main DMI Product Version: 1234567890
00:00:00.000647 main Host RAM: 24103MB total, 17127MB available
00:00:00.000654 main Executable: /usr/local/VirtualBox/VBoxSVC
00:00:00.000655 main Process ID: 9417
00:00:00.000656 main Package type: LINUX_64BITS_GENERIC
00:00:00.110125 nspr-2 Loading settings file "/opt/tomcat/.VirtualBox/VirtualBox.xml" with version "1.12-linux"
00:00:00.110817 nspr-2 Failed to retrive disk info: getDiskName(/dev/md126p1) --> md126p1
00:00:00.264367 nspr-2 VDInit finished
00:00:00.275173 nspr-2 Loading settings file "/opt/tomcat/VirtualBox VMs/vagrant_getting_started_default_1376419129/vagrant_getting_started_default_1376419129.vbox" with version "1.12-linux"
00:00:05.288923 main ERROR [COM]: aRC=VBOX_E_OBJECT_IN_USE (0x80bb000c) aIID={29989373-b111-4654-8493-2e1176cba890} aComponent={Medium} aText={Medium '/opt/tomcat/VirtualBox VMs/vagrant_getting_started_default_1376419129/box-disk1.vmdk' cannot be closed because it is still attached to 1 virtual machines}, preserve=false
00:00:05.290229 Watcher ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={3b2f08eb-b810-4715-bee0-bb06b9880ad2} aComponent={VirtualBox} aText={The object is not ready}, preserve=false
$
Any advice would be greatly appreciated.
Had the same error on OSX. Restarting VirtualBox fixed it :S
sudo /Library/StartupItems/VirtualBox/VirtualBox restart
Also see: https://forums.virtualbox.org/viewtopic.php?t=5489
I solved the problem by re-installing VirtualBox and adding myself to the vboxusers group. The re-installation process printed a message indicating that VM users had to be a member of that group. I don't know if the re-installation was necessary or if being added to the group would have sufficed.
The host machine was 32bits (Ubuntu) and the guest was 64bit, I changed the guest to 32 and it solved the problem.
My understanding is that vboxusers group is related to accessing USB devices within the guest. Not sure why it is causing the issue. Normally, as a vagrant base box build guideline, audio and USB are both disabled.
As per the VirtualBox Manual => The vboxusers group
The Linux installers create the system user group vboxusers during installation. Any system user who is going to use USB devices from VirtualBox guests must be a member of that group. A user can be made a member of the group vboxusers through the GUI user/group management or at the command line with sudo usermod -a -G vboxusers username
Note that adding an active user to that group will require that user to log out and back in again. This should be done manually after successful installation of the package.
I had the same problem. It is because I did a wrong configuration on my Vagrantfile in the provider section. I had tried to make my VM machine more powerfull, with 2 cpus when i have on the machine host just one.
this often happens when you try to add more hardware to your VM machine but your host machine does not have the minimun requirements