etcd not starting, getting error while probing status - amazon-web-services

I am trying to install Kubernetes, the hard way. Instead, of installing on GCP or using VirtualBox, I am installing on AWS. I'm using Red hat Linux image.
I am using 3 nodes for etcd in Stacked Topology. However, while starting the etcd service, it is giving the following error:
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sat 2020-08-01 17:03:37 UTC; 2s ago
Docs: https://github.com/coreos
Process: 1816 ExecStart=/usr/local/bin/etcd --name ip-172-31-60-0 --cert-file=/etc/etcd/etcd-server.crt --key-file=/etc/etcd/etcd-server.key --peer-cert-file=/etc/etcd/etcd-server.crt --peer-key-file=/etc/etcd/etcd-server.key --trusted-ca-file=/etc/etcd/ca.crt --peer-t>
Main PID: 1816 (code=exited, status=203/EXEC)
I tried to look for a solution, can't get past it. How can I resolve it?

Not sure if this will solve your case, but you could try to start the key-value store from scratch using the following command:
ETCDCTL_API=3 etcdctl del "" --from-key=true
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default.etcd
sudo systemctl start etcd
If that doesn't work, maybe the result of systemctl status etcd may help troubleshooting...

Related

Unable to restart Hue in AWS EMR cluster

I have an EMR Cluster in AWS, configured in a Cloudformation template. In my template, I have a step that executes a script on the master node. The purpose of this script is to make changes to the hue.ini file.
The final step in the script is to restart Hue, for the changes to take effect. I'm following this documentation for the correct command. This documentation is explicit with Do Not run restart.
Running sudo systemctl stop hue followed by sudo systemctl start hue leaves Hue in the following state (per sudo systemctl status hue):
[root#ip-10-x-xxx-xxx ~]# sudo systemctl status hue
● hue.service - Hue web server
Loaded: loaded (/etc/systemd/system/hue.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-05-19 18:44:27 UTC; 2s ago
Process: 22743 ExecStart=/etc/init.d/hue start (code=exited, status=1/FAILURE)
Main PID: 17508 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 0B
CGroup: /system.slice/hue.service
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Failed to start Hue web server.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Unit hue.service entered failed state.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: hue.service failed.
Running start again manually on the instance returns this:
Job for hue.service failed because the control process exited with error code. See "systemctl status hue.service" and "journalctl -xe" for details.
Those logs just show the same as above. I have also checked this similar question but the answer does not work for me.
EMR: emr-6.2.0
Hue: 4.8.0
After a little more research, it seems this is not the best approach. The best approach is to include a hue-ini Classification lock in my Cloudformation template. This applies the changes and performs the required restart for you.

ray: installation anyscale-academy tutorial throw socket exception

When install ray, distributed ML framework, I install ray 1.0.1post on VM with centOS 8.2. I follow the official document step by step, the I issue the command to launch tutorial web server:
> jupyter lab
I got exception, something similar as follows:
"/usr/lib64/xxx/tornado/netutil.py", line 196, in bind_sockets sock.bind(sockaddr)
OSError: [Errno 99] Cannot assign requested address
How to fix this exception?
This exception is not related to ray, it is because CentOS default allows partial IPv4, IPv6, so either enable IPv6,or disable IPv6 explicitly. It works. Also one need to disable firewalld to let outside client to access VM web server.
The following command helps to enable IPv6,
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=0
The following command to stop firewalld:
systemctl status firewalld
systemctl stop fiewalld
if in the status command: one see running keyword, it means firewalld is running, after stopping, one see the key word dead:
Running:
Active: active (running) since Fri 2021-03-19 10:17:27 CST; 4 days ago
Dead:
Active: inactive (dead) since Tue 2021-03-23 10:58:10 CST; 3s ago

couchdb installed via snapd on OpenSuse not working

I've installed couchDb 2.0 via snap onto OpenSuse Tumbleweed.
sudo snap install couchdb
Then I ran
sudo systemctl enable --now snapd.socket
Everything works fine until I logout. In my new session I cannot get couchDb running.
Would anyone know of a solution please?
Some more info:
systemctl status snapd
gives:
Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2018-07-28 16:33:45 NZST; 4min 10s ago
May 12 20:31:04 hobbes systemd[1]: Starting Snappy daemon...
May 12 20:31:04 hobbes snapd[4705]: AppArmor status: apparmor is enabled but some features are missing: dbus
May 12 20:31:04 hobbes snapd[4705]: 2018/05/12 20:31:04.773100 daemon.go:323: started snapd/2.32.5-1.10 (series 16; classic; devmode) opensuse-tumbleweed/20180502 (amd>
May 12 20:31:04 hobbes systemd[1]: Started Snappy daemon.
Some feedback from #suse channel:
CouchDB snap failure is due to apparmor, it seems to block starting the service. Try running; sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/* and then snap start couchdb
To fix it so you don't have to run it everytime; save https://paste.opensuse.org/33232726 as /etc/systemd/system/snapd.apparmor.service and systemctl enable snapd.apparmor.service - then reboot and try snap start couchdb send cookies if it works.
from the expiring pastie:
[Unit]
Description=Load AppArmor profiles managed internally by snapd
DefaultDependencies=no
Before=sysinit.target
Requisite=snapd.service
After=apparmor.service
ConditionSecurity=apparmor
[Service]
Type=oneshot
ExecStart=/usr/lib/snapd/snapd-apparmor start
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

google compute startup script starting postgresql

I have a startup script set by an instance template that initializes the server for google compute. After installing postgres, I manually call for it to start using :
/etc/init.d/postgresql start
This completes successfully, but the server is not listening on 5432 when run by the startup script (postgres isn't started, although that service start call completes successfully). After startup completes, and I log in, I can do it successfully. Anyone know why that won't work within the startup script ? I need to load up data during startup so I need to startup postgres during initialization.
I Solved using newer debian image
I had the same problem as you (installing postgresql in GCE startup-script results in the package being installed, but the server is not running), and I think I figured out the root cause.
Normally, the postgresql-11 package is supposed to start the PostgreSQL server after installation. Here is a snippet from its postinst script:
if [ "$1" = configure ]; then
. /usr/share/postgresql-common/maintscripts-functions
configure_version $VERSION "$2"
fi
Taking a look at /usr/share/postgresql-common/maintscripts-functions, we see:
configure_version() {
...
# reload systemd to let the generator pick up the new unit
if [ -d /run/systemd/system ]; then
systemctl daemon-reload
fi
invoke-rc.d postgresql start $VERSION # systemd: argument ignored, starts all versions
}
My debian installation comes with init-system-helpers version "1.56+nmu1", which contains this bit of code in invoke-rc.d:
# avoid deadlocks during bootup and shutdown from units/hooks
# which call "invoke-rc.d service reload" and similar, since
# the synchronous wait plus systemd's normal behaviour of
# transactionally processing all dependencies first easily
# causes dependency loops
if ! systemctl --quiet is-active multi-user.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
case $saction in
start|restart|try-restart)
[ "$_state" != "LoadState=masked" ] || exit 0
systemctl $sctl_args "${saction}" "${UNIT}" && exit 0
;;
The debian postgresql-11 package makes use of templated systemd units. The main one is called postgresql.service but this is a dummy service that doesn't actually do anything. The PostgreSQL server is actually started by a templated unit named postgresql#11-main which is usually started alongside the main service because it has ReloadPropagatedFrom=postgresql.service.
Note that when this issue occurs, the main unit is started but the templated one is not:
$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-04-02 05:40:48 UTC; 32min ago
Main PID: 1663 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4665)
Memory: 0B
CGroup: /system.slice/postgresql.service
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Starting PostgreSQL RDBMS...
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Started PostgreSQL RDBMS.
$ sudo systemctl status postgresql#11-main
● postgresql#11-main.service - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead)
That's because when --job-mode=ignore-dependencies is specified, this link is ignored.
The GcE startup script runs as a systemd unit, which starts before multi-user.target is up:
$ find /etc/systemd | grep startup
/etc/systemd/system/multi-user.target.wants/google-startup-scripts.service
Therefore, invoke-rc.d notices that systemctl --quiet is-active multi-user.target is false and adds --job-mode=ignore-dependencies, which results in the PostgreSQL server not starting.
One possible workaround is explicitly running systemd start postgresql#11-main.service from your startup script after installing postgres.
By the way, I noticed that a recent commit (Nov 2020) changed this invoke-rc.d behavior so that it no longer uses --job-mode=ignore-dependencies. That would help avoid this issue.

Glusterfs-server can't stop in good status

Glusterfs-server can't stop in good status.
I have been trying the following steps to install glusterfs and start service.
yum install centos-release-gluster
yum install glusterfs-server
systemctl start glusterd
and stop it.
systemctl stop glusterd
Then displayed following status "Active: failed".
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled)
Active: failed (Result: exit-code) since 火 2017-01-24 18:23:55 JST; 4s ago
Process: 2523 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2524 (code=exited, status=15)
1月 24 18:23:52 ds009 systemd[1]: Started GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Stopping GlusterFS, a clustered file-system server...
1月 24 18:23:55 ds009 systemd[1]: glusterd.service: main process exited, code=exited, status=15/n/a
1月 24 18:23:55 ds009 systemd[1]: Stopped GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Unit glusterd.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.
Using environment is"CentOS Linux release 7.2.1511 (Core)"
And installed glusterfs-server version is 3.8.8-1.el7.
Does anyone have an idea what wrong is and to fix this.
I have found a workaround.
Edit /usr/lib/systemd/system/glusterd.service file as following.
[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS
ExecStopPost=/usr/bin/systemctl reset-failed glusterd # Add this
KillMode=process
The failed status is clear when the GlusterFS service stop.