google compute startup script starting postgresql

google compute startup script starting postgresql - google-cloud-platform

I have a startup script set by an instance template that initializes the server for google compute. After installing postgres, I manually call for it to start using :
/etc/init.d/postgresql start
This completes successfully, but the server is not listening on 5432 when run by the startup script (postgres isn't started, although that service start call completes successfully). After startup completes, and I log in, I can do it successfully. Anyone know why that won't work within the startup script ? I need to load up data during startup so I need to startup postgres during initialization.

I Solved using newer debian image

I had the same problem as you (installing postgresql in GCE startup-script results in the package being installed, but the server is not running), and I think I figured out the root cause.
Normally, the postgresql-11 package is supposed to start the PostgreSQL server after installation. Here is a snippet from its postinst script:
if [ "$1" = configure ]; then
. /usr/share/postgresql-common/maintscripts-functions
configure_version $VERSION "$2"
fi
Taking a look at /usr/share/postgresql-common/maintscripts-functions, we see:
configure_version() {
...
# reload systemd to let the generator pick up the new unit
if [ -d /run/systemd/system ]; then
systemctl daemon-reload
fi
invoke-rc.d postgresql start $VERSION # systemd: argument ignored, starts all versions
}
My debian installation comes with init-system-helpers version "1.56+nmu1", which contains this bit of code in invoke-rc.d:
# avoid deadlocks during bootup and shutdown from units/hooks
# which call "invoke-rc.d service reload" and similar, since
# the synchronous wait plus systemd's normal behaviour of
# transactionally processing all dependencies first easily
# causes dependency loops
if ! systemctl --quiet is-active multi-user.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
case $saction in
start|restart|try-restart)
[ "$_state" != "LoadState=masked" ] || exit 0
systemctl $sctl_args "${saction}" "${UNIT}" && exit 0
;;
The debian postgresql-11 package makes use of templated systemd units. The main one is called postgresql.service but this is a dummy service that doesn't actually do anything. The PostgreSQL server is actually started by a templated unit named postgresql#11-main which is usually started alongside the main service because it has ReloadPropagatedFrom=postgresql.service.
Note that when this issue occurs, the main unit is started but the templated one is not:
$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-04-02 05:40:48 UTC; 32min ago
Main PID: 1663 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4665)
Memory: 0B
CGroup: /system.slice/postgresql.service
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Starting PostgreSQL RDBMS...
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Started PostgreSQL RDBMS.
$ sudo systemctl status postgresql#11-main
● postgresql#11-main.service - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead)
That's because when --job-mode=ignore-dependencies is specified, this link is ignored.
The GcE startup script runs as a systemd unit, which starts before multi-user.target is up:
$ find /etc/systemd | grep startup
/etc/systemd/system/multi-user.target.wants/google-startup-scripts.service
Therefore, invoke-rc.d notices that systemctl --quiet is-active multi-user.target is false and adds --job-mode=ignore-dependencies, which results in the PostgreSQL server not starting.
One possible workaround is explicitly running systemd start postgresql#11-main.service from your startup script after installing postgres.
By the way, I noticed that a recent commit (Nov 2020) changed this invoke-rc.d behavior so that it no longer uses --job-mode=ignore-dependencies. That would help avoid this issue.

Related

Unable to restart Hue in AWS EMR cluster

I have an EMR Cluster in AWS, configured in a Cloudformation template. In my template, I have a step that executes a script on the master node. The purpose of this script is to make changes to the hue.ini file.
The final step in the script is to restart Hue, for the changes to take effect. I'm following this documentation for the correct command. This documentation is explicit with Do Not run restart.
Running sudo systemctl stop hue followed by sudo systemctl start hue leaves Hue in the following state (per sudo systemctl status hue):
[root#ip-10-x-xxx-xxx ~]# sudo systemctl status hue
● hue.service - Hue web server
Loaded: loaded (/etc/systemd/system/hue.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-05-19 18:44:27 UTC; 2s ago
Process: 22743 ExecStart=/etc/init.d/hue start (code=exited, status=1/FAILURE)
Main PID: 17508 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 0B
CGroup: /system.slice/hue.service
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Failed to start Hue web server.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Unit hue.service entered failed state.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: hue.service failed.
Running start again manually on the instance returns this:
Job for hue.service failed because the control process exited with error code. See "systemctl status hue.service" and "journalctl -xe" for details.
Those logs just show the same as above. I have also checked this similar question but the answer does not work for me.
EMR: emr-6.2.0
Hue: 4.8.0

After a little more research, it seems this is not the best approach. The best approach is to include a hue-ini Classification lock in my Cloudformation template. This applies the changes and performs the required restart for you.

etcd not starting, getting error while probing status

I am trying to install Kubernetes, the hard way. Instead, of installing on GCP or using VirtualBox, I am installing on AWS. I'm using Red hat Linux image.
I am using 3 nodes for etcd in Stacked Topology. However, while starting the etcd service, it is giving the following error:
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sat 2020-08-01 17:03:37 UTC; 2s ago
Docs: https://github.com/coreos
Process: 1816 ExecStart=/usr/local/bin/etcd --name ip-172-31-60-0 --cert-file=/etc/etcd/etcd-server.crt --key-file=/etc/etcd/etcd-server.key --peer-cert-file=/etc/etcd/etcd-server.crt --peer-key-file=/etc/etcd/etcd-server.key --trusted-ca-file=/etc/etcd/ca.crt --peer-t>
Main PID: 1816 (code=exited, status=203/EXEC)
I tried to look for a solution, can't get past it. How can I resolve it?

Not sure if this will solve your case, but you could try to start the key-value store from scratch using the following command:
ETCDCTL_API=3 etcdctl del "" --from-key=true
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default.etcd
sudo systemctl start etcd
If that doesn't work, maybe the result of systemctl status etcd may help troubleshooting...

couchdb installed via snapd on OpenSuse not working

I've installed couchDb 2.0 via snap onto OpenSuse Tumbleweed.
sudo snap install couchdb
Then I ran
sudo systemctl enable --now snapd.socket
Everything works fine until I logout. In my new session I cannot get couchDb running.
Would anyone know of a solution please?
Some more info:
systemctl status snapd
gives:
Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2018-07-28 16:33:45 NZST; 4min 10s ago
May 12 20:31:04 hobbes systemd[1]: Starting Snappy daemon...
May 12 20:31:04 hobbes snapd[4705]: AppArmor status: apparmor is enabled but some features are missing: dbus
May 12 20:31:04 hobbes snapd[4705]: 2018/05/12 20:31:04.773100 daemon.go:323: started snapd/2.32.5-1.10 (series 16; classic; devmode) opensuse-tumbleweed/20180502 (amd>
May 12 20:31:04 hobbes systemd[1]: Started Snappy daemon.

Some feedback from #suse channel:
CouchDB snap failure is due to apparmor, it seems to block starting the service. Try running; sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/* and then snap start couchdb
To fix it so you don't have to run it everytime; save https://paste.opensuse.org/33232726 as /etc/systemd/system/snapd.apparmor.service and systemctl enable snapd.apparmor.service - then reboot and try snap start couchdb send cookies if it works.
from the expiring pastie:
[Unit]
Description=Load AppArmor profiles managed internally by snapd
DefaultDependencies=no
Before=sysinit.target
Requisite=snapd.service
After=apparmor.service
ConditionSecurity=apparmor
[Service]
Type=oneshot
ExecStart=/usr/lib/snapd/snapd-apparmor start
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

How to debug a failed systemctl service (code=exited, status=217/USER)?

I'm trying to add my first service on rhel7 (which resides in AWS/EC2), but - the service is not configured correctly - as I get:
[ec2-user#ip-172-30-1-96 ~]$ systemctl status clouddirectd.service -l
● clouddirectd.service - CloudDirect Daemon
Loaded: loaded (/usr/lib/systemd/system/clouddirectd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2018-01-09 16:09:42 EST; 8s ago
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: Unit clouddirectd.service entered failed state.
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service failed.
Also:
[ec2-user#ip-172-30-1-96 ~]$ systemctl is-active clouddirectd
activating
[ec2-user#ip-172-30-1-96 ~]$ sudo systemctl list-units --type service --all | grep clouddirectd
clouddirectd.service loaded activating auto-restart CloudDirect Daemon
And my unit file is:
[ec2-user#ip-172-30-1-96 ~]$ cat /usr/lib/systemd/system/clouddirectd.service
[Unit]
Description=CloudDirect Daemon
After=network.target
[Service]
Environment=AWS_SHARED_CREDENTIALS_FILE=/etc/sonar/.aws/credentials
#ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/sonar/clouddirectd -c /etc/sonar/clouddirect/clouddirectd.conf
ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/clouddirect -c /etc/sonar/clouddirect.conf
# #PERM# allow group write permission on newly created files
UMask=0007
#User=clouddirectd
User=clouddirect
Group=sonar
KillSignal=SIGINT
TimeoutStopSec=60min
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Can you suggest how to debug this systemctl service so it won't keep dying and auto restarting?

The error 217 indicate the user did not exist at the time the service tried to start. In your case the user specified in your service is clouddirect.
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
This could be caused if that is not the actual user name (for example if it has a typo), it can also be caused if the user is part of some external user store (ex: LDAP or Active Directory) and the service which needs to start that allows the Linux server to access the external user store is not up yet. For example vasd.service starts a product used to allow Linux to authenticate against Active Directory, if vasd.service is not up and you have specified a user that is only available in Active Directory you would want to add that service in your After= line. For example:
After=network.target vasd.service

There's two parts to the question. One is how to diagnose a 217/USER, the other is how to fix it. I'll just focus on the former.
For the 217/USER there's some good pointers here:
https://www.reddit.com/r/linuxquestions/comments/oaya49/systemd_service_not_starting_with_status217/
217 doesnt' "always" mean it's a user problem, it just means it exited with a 217. May or may not...
You could use journalctl to see the logs of which services "seem to come up after it does" initially or what not.
It's possible that "network users" aren't yet available at the time the system is started during boot, you can fix that by adding After=nss-user-lookup.target https://systemd.io/UIDS-GIDS/ though that's not the case here since it still fails after restarting, which is later. systemd expects the user specified to "be available" when the service starts. So for "system users" (which start early running processes) they need to be available on the local box. For later started processes they can be "network users".
You could also try changing your group and username (and environment) to what you "think" systemd is running and run it manually, see what happens. https://serverfault.com/questions/410577/execute-a-command-from-another-group
Kind of wish systemd output more debug so you could tell what it is running more easily...
In certain bizarre cases you may need to specify both User= and Group= https://superuser.com/a/1452367/39364
In our case running "vintela status" had a message "SELinux may not be configured correctly" and sure enough, after disabling SELinux, it started working as expected, no more 217. [redhat 8]

Starting uWSGI service fails silently

I'm having a problem which is driving me nuts. I'm trying to create an Ubuntu 15.04 Upstart job, which would start uWSGI server (uWSGI version 2.0.7-debian) running a Django application.
I've configured the service as a follows and try to start the job by issuing command $ sudo service uwsgi start. There is no output in log files, no socket file created and no error.
# file: /etc/init/uwsgi.conf
description "uWSGI server"
start on runlevel [2345]
stop on runlevel [!2345]
exec /usr/bin/uwsgi --ini /srv/configs/uwsgi.ini
service uwsgi status says
● uwsgi.service - LSB: Start/stop uWSGI server instance(s)
Loaded: loaded (/etc/init.d/uwsgi)
Active: active (exited) since Tue 2015-05-05 09:40:08 EEST; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 21405 ExecStop=/etc/init.d/uwsgi stop (code=exited, status=0/SUCCESS)
Process: 21460 ExecStart=/etc/init.d/uwsgi start (code=exited, status=0/SUCCESS)
May 05 09:40:08 web-2 systemd[1]: Starting LSB: Start/stop uWSGI server instance(s)...
May 05 09:40:08 web-2 uwsgi[21460]: * Starting app server(s) uwsgi
May 05 09:40:08 web-2 uwsgi[21460]: ...done.
May 05 09:40:08 web-2 systemd[1]: Started LSB: Start/stop uWSGI server instance(s).
The uWSGI app is configured as
[uwsgi]
uid = www-data
gid = www-data
socket = /srv/sockets/uwsgi.sock
chmod-socket = 666
master = true
processes = 4
enable-threads = true
plugins = python
chdir = /srv/sites/current
module = wsgi:application
virtualenv = /srv/environments/current
logto = /srv/logs/uwsgi.log
logfile-chown = true
touch-reload = /srv/sites/current/wsgi.py
The service starts fine and everything works ok if the service is started directly, e.g. by issuing command:
sudo -u www-data /usr/bin/uwsgi --ini /srv/configs/uwsgi.ini
For the socket dir and log files directories I've set the most relaxed permissions.
So, I'm at a loss. What to do next?

Thanks to nmgeeks comment, I started looking at other places where uWSGI is configured.
While trying to move everything our own app related under /srv I deleted /run/uwsgi directory, where uWSGI is trying to create a PID file.
You can find the reference at /usr/share/uwsgi/conf/default.ini.
Too bad uWSGI doesn't produce any error in this situation, which left me frustrated for a day.

I just want to put some additional details. In my case I didn't delete anything, but, for some reasons, systemd was not able to create /run/uwsgi directory. So, I did echo "/run/uwsgi 0755 www-data www-data -" > /etc/tmpfiles.d/uwsgi.conf. After that I restarted my VM (I didn't find a way to apply this changes without restarting) and uwsgi started to work!
Yes, it's too bad that uWSGI doesn't produce any error in this situation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js