Rootless Podman with systemd in ubi8 Container on RHEL8 not working - cgroups

We are trying to run a Container from ubi8-init Image as non root user under RHEL8 with podman. We enabled cgroups 2 globally by adding kernel parameters and checked versioins:
cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1
$ podman -v
podman version 2.0.5
$ podman info --debug
host:
arch: amd64
buildahVersion: 1.15.1
cgroupVersion: v2
Subuid and subguid are set:
bob:100000:65536
Due to permission problem, ugly workaround:
Failed to create /user.slice/user-992.slice/session-371.scope/init.scope control group: Permission denied
$ chown -R 992 /sys/fs/cgroup/user.slice/user-992.slice/session-371.scope
Now we are able to run the container and jump into it via exec /bin/bash. Problem is we get following error if we want to copy something into the container using podman cp:
opening file `/sys/fs/cgroup/cgroup.freeze` for writing: Permission denied
Sample output from commands without chown workaround:
# Trying with --cgroup-manager=systemd
$ podman run --name=ubi-init-test --cgroup-manager=systemd -it --rm --systemd=true ubi8-init
Error: writing file `/sys/fs/cgroup/user.slice/user-992.slice/user#992.service/cgroup.subtree_control`: No such file or directory: OCI runtime command not found error
# Trying with --cgroup-manager=cgroupfs
$ podman run --name=ubi-init-test --cgroup-manager=cgroupfs -it --rm --systemd=true ubi8-init
systemd 239 (239-41.el8_3) running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy)
Detected virtualization container-other.
Detected architecture x86-64.
Welcome to Red Hat Enterprise Linux 8.3 (Ootpa)!
Set hostname to <b64ed4493a24>.
Initializing machine ID from random generator.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.
There must be either something completely wrong, misconfigured or buggy. Has anyone done this or any advice regarding the issues we run into?

Trying to solve similar issue.
I did setsebool -P container_manage_cgroup true on top of adding kernel parameters for cgroups v2. But it didn't help. Then I found this comment https://bbs.archlinux.org/viewtopic.php?pid=1895705#p1895705 and moved little bit further with --cgroup-manager=cgroupfs (used podman unshare and then unset DBUS_SESSION_BUS_ADDRESS):
$ echo $DBUS_SESSION_BUS_ADDRESS
unix:path=/run/user/1000/bus
$ podman unshare
$ export DBUS_SESSION_BUS_ADDRESS=
$ podman run --name=ubi-init-test --cgroup-manager=cgroupfs -it --rm --systemd=true ubi8-init
systemd 239 (239-41.el8_3.1) running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy)
Detected virtualization container-other.
Detected architecture x86-64.
Welcome to Red Hat Enterprise Linux 8.3 (Ootpa)!
Set hostname to <3caae9f73645>.
Initializing machine ID from random generator.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Couldn't move remaining userspace processes, ignoring: Input/output error
[ OK ] Reached target Local File Systems.
[ OK ] Listening on Journal Socket.
[ OK ] Reached target Network is Online.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
Starting Rebuild Journal Catalog...
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Paths.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Reached target Swap.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on Journal Socket (/dev/log).
Starting Journal Service...
Starting Rebuild Dynamic Linker Cache...
Starting Create System Users...
[ OK ] Started Rebuild Journal Catalog.
[ OK ] Started Create System Users.
[ OK ] Started Rebuild Dynamic Linker Cache.
Starting Update is Completed...
[ OK ] Started Update is Completed.
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Started Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Started Create Volatile Files and Directories.
Starting Update UTMP about System Boot/Shutdown...
[ OK ] Started Update UTMP about System Boot/Shutdown.
[ OK ] Reached target System Initialization.
[ OK ] Started dnf makecache --timer.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Reached target Basic System.
Starting Permit User Sessions...
[ OK ] Started D-Bus System Message Bus.
[ OK ] Started Permit User Sessions.
[ OK ] Reached target Multi-User System.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started Update UTMP about System Runlevel Changes.

Related

aws ec2 ssh fails after creating an image of the instance

I regularly create an image of a running instance without stopping it first. That has worked for years without any issues. Tonight, I created another image of the instance (without any changes to the virtual server settings except for a "sudo yum update -y") and noticed my ssh session was closed. It looked like it was rebooted after the image was created. Then the web console showed 1/2 status checks passed. I rebooted it a few times and the status remained the same. The log showed:
Setting hostname localhost.localdomain: [ OK ]
Setting up Logical Volume Management: [ 3.756261] random: lvm: uninitialized urandom read (4 bytes read)
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
[ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/xvda1
/: clean, 437670/1048576 files, 3117833/4193787 blocks
[/sbin/fsck.xfs (1) -- /mnt/pgsql-data] fsck.xfs -a /dev/xvdf
[/sbin/fsck.ext2 (2) -- /mnt/couchbase] fsck.ext2 -a /dev/xvdg
/sbin/fsck.xfs: XFS file system.
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdg
/dev/xvdg:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
[ 3.811304] random: crng init done
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
[FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
/dev/fd/9: line 2: plymouth: command not found
Give root password for maintenance
(or type Control-D to continue):
It looked like /dev/xvdg failed the disk check. I detached the volume from the instance and reboot. I still couldn't ssh in. I re-attached it and rebooted. Now it says status check 2/2 passed but I still can't ssh back in and the log still showed issues with /dev/xvdg as above.
Any help would be appreciated. Thank you!
Thomas

AWS - EC 2 All of sudden lost access due to No /sbin/init, trying fallback

In my AWS EC2 instance, was locked and lost access from December 6th for an unknown reason, it cannot be an action i did on the EC2, because i was overseas on holidays from December 01st and Came back January 01st, I realized server was lost connection from 6t December and i have no way to connect to the EC2 now on,
EC2 runs on CENTOS 7 and PHP, NGINX, SSHD setup.
When i checked the System Log i see below.
[[32m OK [0m] Started Cleanup udevd DB.
[[32m OK [0m] Reached target Switch Root.
Starting Switch Root...
[ 6.058942] systemd-journald[99]: Received SIGTERM from PID 1 (systemd).
[ 6.077915] systemd[1]: No /sbin/init, trying fallback
[ 6.083729] systemd[1]: Failed to execute /bin/sh, giving up: No such file or directory
[ 180.596117] random: crng init done
Any idea on what is the issue will be much appreciated
In Brief, i had to following to recover, The root cause has been that the disk was completely full.
) Problem mounting the slaved volume (xfs_admin)
) Not able to chroot the environment (ln -s)
) Disk at 100% (df -h) Removing var/log files
) Rebuilt the initramfs (dracut -f)
) Rename the etc/fstab
) Switched the Slave volume back to original UUID (xfs_admin)
) Configured the Grub to boot the latest version of the kernel/initramfs
) Rebuilt Initramfs and Grub

haproxy container error when starting ECS

I've been trying for some time to work out what is the problem when I start the haproxy container on AWS ECS cluster. Any help to diagnose and fix the problem would be appreciated. This is the error below. I created security group rules on the linked container to allow traffic on the required ports.
<7>haproxy-systemd-wrapper: executing /usr/local/sbin/haproxy -p /run/haproxy.pid -db -f /usr/local/etc/haproxy/haproxy.cfg -Ds
Usage : haproxy [-f <cfgfile|cfgdir>]* [ -vdVD ] [ -n <maxconn> ] [ -N <maxpconn> ]
[ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*]
-v displays version ; -vv shows known build options.
-d enters debug mode ; -db only disables background mode.
-dM[<byte>] poisons memory with <byte> (defaults to 0x50)
-V enters verbose mode (disables quiet mode)
-D goes daemon ; -C changes to <dir> before loading files.
-q quiet mode : don't display messages
-c check mode : only check config files and exit
-n sets the maximum total # of connections (2000)
-m limits the usable amount of memory (in MB)
-N sets the default, per-proxy maximum # of connections (2000)
-L set local peer name (default to hostname)
-p writes pids of all children to this file
-de disables epoll() usage even when available
-dp disables poll() usage even when available
-dS disables splice usage (broken on old kernels)
-dR disables SO_REUSEPORT usage
-dr ignores server address resolution failures
-dV disables SSL verify on servers side
-sf/-st [pid ]* finishes/terminates old pids.
HA-Proxy version 1.7.10-a7dcc3b 2018/01/02
Copyright 2000-2018 Willy Tarreau <willy#haproxy.org>
<5>haproxy-systemd-wrapper: exit, haproxy RC=1

Unable to ssh to EC2 instance after some works. Permission denied (public key)

Yesterday, I made one instance of ubuntu-server on Amazon EC2.
Actually I could ssh to this instance, and I set up my developing enviroment on this instance.
But after reboot the instance, I got an error of ssh. I'm confused because this command and local machine's enviroment are same! Of course, private key is also same.
$ ssh -i "xxx.pem" ubuntu#xxxxxx
Permission denied (publie key).
So I doubt that I've edited /etc/rc.local to start my service, so I cleared this configuration with a rescue instance. Actually I could succeed to edit the instance's file with the rescue instance and mounting volume without ssh.
In this rescue instance, I could use same private key (.pem) and Elastic IP.
environments
ubuntu-trusty-14.04-amd64-server-20150325
t2.micro
using EIP
security group
SSH, TCP, 22, 0.0.0.0/0
instance status on console
2/2 check is ok
system status check: ok
instance status check: ok
system log
I also checked system log on web console, I coudn't find errors.
* Stopping Mount filesystems on boot[74G[ OK ]
* Starting Signal sysvinit that local filesystems are mounted[74G[ OK ]
* Starting flush early job output to logs[74G[ OK ]
* Stopping Failsafe Boot Delay[74G[ OK ]
* Starting System V initialisation compatibility[74G[ OK ]
* Stopping flush early job output to logs[74G[ OK ]
* Starting D-Bus system message bus[74G[ OK ]
* Starting SystemD login management service[74G[ OK ]
* Starting system logging daemon[74G[ OK ]
* Starting early crypto disks... [80G
[74G[ OK ]
* Starting Handle applying cloud-config[74G[ OK ]
* Starting Bridge file events into upstart[74G[ OK ]
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
* Starting AppArmor profiles [80G
[74G[ OK ]
* Setting up X socket directories... [80G
[74G[ OK ]
* Stopping System V initialisation compatibility[74G[ OK ]
* Starting System V runlevel compatibility[74G[ OK ]
open-vm-tools: not starting as this is not a VMware VM
landscape-client is not configured, please run landscape-config.
* Starting xinetd daemon[74G[ OK ]
* Starting ACPI daemon[74G[ OK ]
* Starting save kernel messages[74G[ OK ]
* Starting configure network device security[74G[ OK ]
* Starting OpenSSH server[74G[ OK ]
* Starting regular background program processing daemon[74G[ OK ]
* Starting deferred execution scheduler[74G[ OK ]
* Stopping save kernel messages[74G[ OK ]
* Stopping CPU interrupts balancing daemon[74G[ OK ]
* Starting configure virtual network devices[74G[ OK ]
* Starting automatic crash report generation[74G[ OK ]
* Restoring resolver state... [80G
[74G[ OK ]
Cloud-init v. 0.7.5 running 'modules:config' at Tue, 15 Dec 2015 06:23:27 +0000. Up 6.40 seconds.
* Stopping Handle applying cloud-config[74G[ OK ]
* Stopping cold plug devices[74G[ OK ]
* Stopping log initial device creation[74G[ OK ]
* Starting enable remaining boot-time encrypted block devices[74G[ OK ]
* Starting save udev log and update rules[74G[ OK ]
* Stopping save udev log and update rules[74G[ OK ]
*
* Stopping System V runlevel compatibility[74G[ OK ]
* Starting execute cloud user/final scripts[74G[ OK ]
Cloud-init v. 0.7.5 running 'modules:final' at Tue, 15 Dec 2015 06:23:29 +0000. Up 7.81 seconds.
Cloud-init v. 0.7.5 finished at Tue, 15 Dec 2015 06:23:29 +0000. Datasource DataSourceEc2. Up 7.86 seconds
Ubuntu 14.04.3 LTS ip-10-0-0-17 ttyS0
ip-10-0-0-17 login:
Perhaps I dropped some important settings, could anyone give me hints for this trouble?

SaltStack: Getting Up and Running Minion on EC2

I am working through the SaltStack walk through to set up salt on my ec2 cluster. I just edited /etc/salt/minion and added the public dns of my salt master.
master: ec2-54-201-153-192.us-west-2.compute.amazonaws.com
Then I restarted the minion. In debug mode, this put out the following
$ sudo salt-minion -l debug
[DEBUG ] Reading configuration from /etc/salt/minion
[INFO ] Using cached minion ID: localhost.localdomain
[DEBUG ] loading log_handlers in ['/var/cache/salt/minion/extmods/log_handlers', '/usr/lib/python2.6/site-packages/salt/log/handlers']
[DEBUG ] Skipping /var/cache/salt/minion/extmods/log_handlers, it is not a directory
[DEBUG ] None of the required configuration sections, 'logstash_udp_handler' and 'logstash_zmq_handler', were found the in the configuration. Not loading the Logstash logging handlers module.
[DEBUG ] Configuration file path: /etc/salt/minion
[INFO ] Setting up the Salt Minion "localhost.localdomain"
[DEBUG ] Created pidfile: /var/run/salt-minion.pid
[DEBUG ] Chowned pidfile: /var/run/salt-minion.pid to user: root
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] loading grain in ['/var/cache/salt/minion/extmods/grains', '/usr/lib/python2.6/site-packages/salt/grains']
[DEBUG ] Skipping /var/cache/salt/minion/extmods/grains, it is not a directory
[DEBUG ] Attempting to authenticate with the Salt Master at 172.31.21.27
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
Sure enough, 172.31.21.27 is the private ip of the master. So far this looks ok. According to the walkthrough, the next step is to accept the minions key on the master:
"Now that the minion is started it will generate cryptographic keys and attempt to
connect to the master. The next step is to venture back to the master server and
accept the new minion's public key."
However, when I go to the master node and look for new keys I don't see any pending requests.
$ sudo salt-key -L
Accepted Keys:
Unaccepted Keys:
Rejected Keys:
And the ping test does not see the minion either:
$ sudo salt '*' test.ping
This is where Im stuck, what should I do next to get up and running?
Turn off iptables and do salt-key -L to check if the key shows up. If it does, then you need to open port 4505 and 4506 on the master for the minion to be able to connect to it. You could do lokkit -p tcp:4505 -p tcp:4506 to open these ports.
You likely need to add rules for 4505/4506 between the salt master and minion security group. Salt master needs these ports to be able to communicate with the minions.