poweroff redirect system halted - exit

I'm playing with image built from Builtroot, and run with QEMU. But when I run "poweroff" command, machine not shutdown. And "System halted" appear! (See console output bellow)
# poweroff
# Stopping network: OK
Saving random seed: random: dd: uninitialized urandom read (512 bytes read)
OK
Stopping klogd: OK
Stopping syslogd: OK
umount: devtmpfs busy - remounted read-only
EXT4-fs (sda): re-mounted. Opts: (null)
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
sd 0:0:0:0: [sda] Synchronizing SCSI cache
reboot: System halted
Can someone help me?
Thank you!

Related

aws ec2 ssh fails after creating an image of the instance

I regularly create an image of a running instance without stopping it first. That has worked for years without any issues. Tonight, I created another image of the instance (without any changes to the virtual server settings except for a "sudo yum update -y") and noticed my ssh session was closed. It looked like it was rebooted after the image was created. Then the web console showed 1/2 status checks passed. I rebooted it a few times and the status remained the same. The log showed:
Setting hostname localhost.localdomain: [ OK ]
Setting up Logical Volume Management: [ 3.756261] random: lvm: uninitialized urandom read (4 bytes read)
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
[ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/xvda1
/: clean, 437670/1048576 files, 3117833/4193787 blocks
[/sbin/fsck.xfs (1) -- /mnt/pgsql-data] fsck.xfs -a /dev/xvdf
[/sbin/fsck.ext2 (2) -- /mnt/couchbase] fsck.ext2 -a /dev/xvdg
/sbin/fsck.xfs: XFS file system.
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdg
/dev/xvdg:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
[ 3.811304] random: crng init done
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
[FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
/dev/fd/9: line 2: plymouth: command not found
Give root password for maintenance
(or type Control-D to continue):
It looked like /dev/xvdg failed the disk check. I detached the volume from the instance and reboot. I still couldn't ssh in. I re-attached it and rebooted. Now it says status check 2/2 passed but I still can't ssh back in and the log still showed issues with /dev/xvdg as above.
Any help would be appreciated. Thank you!
Thomas

AWS - EC 2 All of sudden lost access due to No /sbin/init, trying fallback

In my AWS EC2 instance, was locked and lost access from December 6th for an unknown reason, it cannot be an action i did on the EC2, because i was overseas on holidays from December 01st and Came back January 01st, I realized server was lost connection from 6t December and i have no way to connect to the EC2 now on,
EC2 runs on CENTOS 7 and PHP, NGINX, SSHD setup.
When i checked the System Log i see below.
[[32m OK [0m] Started Cleanup udevd DB.
[[32m OK [0m] Reached target Switch Root.
Starting Switch Root...
[ 6.058942] systemd-journald[99]: Received SIGTERM from PID 1 (systemd).
[ 6.077915] systemd[1]: No /sbin/init, trying fallback
[ 6.083729] systemd[1]: Failed to execute /bin/sh, giving up: No such file or directory
[ 180.596117] random: crng init done
Any idea on what is the issue will be much appreciated
In Brief, i had to following to recover, The root cause has been that the disk was completely full.
) Problem mounting the slaved volume (xfs_admin)
) Not able to chroot the environment (ln -s)
) Disk at 100% (df -h) Removing var/log files
) Rebuilt the initramfs (dracut -f)
) Rename the etc/fstab
) Switched the Slave volume back to original UUID (xfs_admin)
) Configured the Grub to boot the latest version of the kernel/initramfs
) Rebuilt Initramfs and Grub

GCloud compute instance halting but not stopping

I want to run a gcloud compute instance only for the duration of my startup script. My startup script calls a nodejs script which spawns sudo halt or shutdown -h 0. Observing the serial port the system comes to a halt but remains in the RUNNING state, never going into STOPPING or TERMINATED:
Starting Halt...
[[0;32m OK [0m] Stopped Monitoring of LVM2 mirrors,…sing dmeventd or progress polling.
Stopping LVM2 metadata daemon...
[[0;32m OK [0m] Stopped LVM2 metadata daemon.
[ 34.560467] reboot: System halted
How is it possible that the system can halt completely but the instance doesn't register as such?
As #DazWilkin said in the comments, you can use poweroff to stop a VM instance in Compute Engine. Alternatively, per the official Compute Engine documentation you can also run sudo shutdown -h now or sudo poweroff which will both accomplish what you are trying to do.

uwsgi listen queue fills on reload

I'm running a Django app on uwsgi with an average of 110 concurrent users and 5 requests per second during peak hours. I'm finding that when I deploy with uwsgi reload during these peak hours I am starting to run into an issue where workers keep getting slowly killed and restarted, and then the uwsgi logs begin to throw an error:
Gracefully killing worker 1 (pid: 25145)...
Gracefully killing worker 2 (pid: 25147)...
... a few minutes go by ...
worker 2 killed successfully (pid: 25147)
Respawned uWSGI worker 2 (new pid: 727)
... a few minutes go by ...
worker 2 killed successfully (pid: 727)
Respawned uWSGI worker 2 (new pid: 896)
... this continues gradually for 25 minutes until:
*** listen queue of socket "127.0.0.1:8001" (fd: 3) full !!! (101/100) ***
At this point my app rapidly slows to a crawl and I can only recover with a hard uwsgi stop followed by a uwsgi start. There are some relevant details which make this situation kind of peculiar:
This only occurs when I uwsgi reload, otherwise the listen queue never fills up on its own
The error messages and slowdown only start to occur about 25 minutes after the reload
Even during the moment of crisis, memory and CPU resources on the machine seem fine
If I deploy during lighter traffic times, this issue does not seem to pop up
I realize that I can increase the listen queue size, but that seems like a band-aid more than an actual solution. And the fact that it only fills up during reload (and takes 25 minutes to do so) leads me to believe that it will fill up eventually regardless of the size. I would like to figure out the mechanism that is causing the queue to fill up and address that at the source.
Relevant uwsgi config:
[uwsgi]
socket = 127.0.0.1:8001
processes = 4
threads = 2
max-requests = 300
reload-on-rss = 800
vacuum = True
touch-reload = foo/uwsgi/reload.txt
memory-report = true
Relevant software version numbers:
uwsgi 2.0.14
Ubuntu 14.04.1
Django 1.11.13
Python 2.7.6
It appears that our touch reload is not graceful when we have slight traffic, is this to be expected or do we have a more fundamental issue?
On uwsgi there is a harakiri mode that will periodically kill long running processes to prevent unreliable code from hanging (and effectively taking down the app). I would suggest looking there for why your processes are being killed.
As to why a hard stop works and a graceful stop does not -- it seems to further indicate your application code is hanging. A graceful stop will send SIGHUP, which allows resources to be cleaned up in the application. SIGINT and SIGTERM follow the harsher guidelines of "stop what you are doing right now and exit".
Anyway, it boils down to this not being a uwsgi issue, but an issue in your application code. Find what is hanging and why. Since you are not noticing CPU spikes; some probable places to look are...
blocking connections
locks
a long sleep
Good luck!
The key thing you need to look is "listen queue of socket "127.0.0.1:8001" (fd: 3) full !!! (101/100)"
Default listen queue size is 100. Increase the queue size by adding the option "listen" in uwsgi.ini.
"listen = 4096"

Why is the KernelRestarter killing my IBM DSX python kernel

On IBM DSX I find that if i leave a long running python notebook running overnight, the kernel dies around the same time (around midnight UTC).
The jupyter log shows :
[I 2017-07-29 23:37:14.929 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel e827e71b-6492-4dc4-9201-b6ce29c2100c restarted
[D 2017-07-29 23:37:14.950 NotebookApp] Starting kernel: [u'/usr/local/src/bluemix_jupyter_bundle.v54/provision/pyspark_kernel_wrapper.sh', u'/gpfs/fs01/user/sc1c-81b7dbb381fb6a-c4b9ad2fa578/notebook/jupyter-rt/kernel-e827e71b-6492-4dc4-9201-b6ce29c2100c.json', u'spark20master']
[D 2017-07-29 23:37:14.954 NotebookApp] Connecting to: tcp://127.0.0.1:42931
[D 2017-07-29 23:37:17.957 NotebookApp] KernelRestarter:
restart apparently succeeded
Kernel log or Jupyter log shows nothing else before this point.
Is there some policy that is being enforced here to kill kernels? or maybe some scheduled downtime each day? Does anybody know why the "KernelRestarter" is kicking in?
The KernelRestarter is not killing anything. It notices that the kernel is gone and starts a new one automatically. DSX has inactivity timeouts, but those would shut down your service altogether rather than kill a kernel. And inactivity timeouts are not tied to a wall clock time. This seems to be a bug in DSX.