Systemd service should log to journal but not to rsyslog (/var/log/messages) - centos7

Why is the log data also displayed in the file "/var/log/messages" if you specify StandardOutput=null and StandardError=journal in the systemd service? I'm using Centos7 as the operating system.
[Service]
Restart=always
TimeoutStartSec=1200
StandardOutput=null
StandardError=journal
I can see both messages in journal and in the file /var/log/messages.
journalctl -u my_service
----
systemd[1]: my_service.service holdoff time over, scheduling restart.
----
cat /var/log/messages | grep my_service
----
systemd[1]: my_service.service holdoff time over, scheduling restart.
----
What additional adjustments do I have to make so that the service only logs its error messages in the journal?
EDIT:
I use the default configuration for journal "/etc/systemd/journald.conf". All lines are commented out.

Related

Unable to restart Hue in AWS EMR cluster

I have an EMR Cluster in AWS, configured in a Cloudformation template. In my template, I have a step that executes a script on the master node. The purpose of this script is to make changes to the hue.ini file.
The final step in the script is to restart Hue, for the changes to take effect. I'm following this documentation for the correct command. This documentation is explicit with Do Not run restart.
Running sudo systemctl stop hue followed by sudo systemctl start hue leaves Hue in the following state (per sudo systemctl status hue):
[root#ip-10-x-xxx-xxx ~]# sudo systemctl status hue
● hue.service - Hue web server
Loaded: loaded (/etc/systemd/system/hue.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2021-05-19 18:44:27 UTC; 2s ago
Process: 22743 ExecStart=/etc/init.d/hue start (code=exited, status=1/FAILURE)
Main PID: 17508 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 0B
CGroup: /system.slice/hue.service
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Failed to start Hue web server.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: Unit hue.service entered failed state.
May 19 18:44:27 ip-10-x-xxx-xxx systemd[1]: hue.service failed.
Running start again manually on the instance returns this:
Job for hue.service failed because the control process exited with error code. See "systemctl status hue.service" and "journalctl -xe" for details.
Those logs just show the same as above. I have also checked this similar question but the answer does not work for me.
EMR: emr-6.2.0
Hue: 4.8.0
After a little more research, it seems this is not the best approach. The best approach is to include a hue-ini Classification lock in my Cloudformation template. This applies the changes and performs the required restart for you.

couchdb installed via snapd on OpenSuse not working

I've installed couchDb 2.0 via snap onto OpenSuse Tumbleweed.
sudo snap install couchdb
Then I ran
sudo systemctl enable --now snapd.socket
Everything works fine until I logout. In my new session I cannot get couchDb running.
Would anyone know of a solution please?
Some more info:
systemctl status snapd
gives:
Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2018-07-28 16:33:45 NZST; 4min 10s ago
May 12 20:31:04 hobbes systemd[1]: Starting Snappy daemon...
May 12 20:31:04 hobbes snapd[4705]: AppArmor status: apparmor is enabled but some features are missing: dbus
May 12 20:31:04 hobbes snapd[4705]: 2018/05/12 20:31:04.773100 daemon.go:323: started snapd/2.32.5-1.10 (series 16; classic; devmode) opensuse-tumbleweed/20180502 (amd>
May 12 20:31:04 hobbes systemd[1]: Started Snappy daemon.
Some feedback from #suse channel:
CouchDB snap failure is due to apparmor, it seems to block starting the service. Try running; sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/* and then snap start couchdb
To fix it so you don't have to run it everytime; save https://paste.opensuse.org/33232726 as /etc/systemd/system/snapd.apparmor.service and systemctl enable snapd.apparmor.service - then reboot and try snap start couchdb send cookies if it works.
from the expiring pastie:
[Unit]
Description=Load AppArmor profiles managed internally by snapd
DefaultDependencies=no
Before=sysinit.target
Requisite=snapd.service
After=apparmor.service
ConditionSecurity=apparmor
[Service]
Type=oneshot
ExecStart=/usr/lib/snapd/snapd-apparmor start
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

google compute startup script starting postgresql

I have a startup script set by an instance template that initializes the server for google compute. After installing postgres, I manually call for it to start using :
/etc/init.d/postgresql start
This completes successfully, but the server is not listening on 5432 when run by the startup script (postgres isn't started, although that service start call completes successfully). After startup completes, and I log in, I can do it successfully. Anyone know why that won't work within the startup script ? I need to load up data during startup so I need to startup postgres during initialization.
I Solved using newer debian image
I had the same problem as you (installing postgresql in GCE startup-script results in the package being installed, but the server is not running), and I think I figured out the root cause.
Normally, the postgresql-11 package is supposed to start the PostgreSQL server after installation. Here is a snippet from its postinst script:
if [ "$1" = configure ]; then
. /usr/share/postgresql-common/maintscripts-functions
configure_version $VERSION "$2"
fi
Taking a look at /usr/share/postgresql-common/maintscripts-functions, we see:
configure_version() {
...
# reload systemd to let the generator pick up the new unit
if [ -d /run/systemd/system ]; then
systemctl daemon-reload
fi
invoke-rc.d postgresql start $VERSION # systemd: argument ignored, starts all versions
}
My debian installation comes with init-system-helpers version "1.56+nmu1", which contains this bit of code in invoke-rc.d:
# avoid deadlocks during bootup and shutdown from units/hooks
# which call "invoke-rc.d service reload" and similar, since
# the synchronous wait plus systemd's normal behaviour of
# transactionally processing all dependencies first easily
# causes dependency loops
if ! systemctl --quiet is-active multi-user.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
case $saction in
start|restart|try-restart)
[ "$_state" != "LoadState=masked" ] || exit 0
systemctl $sctl_args "${saction}" "${UNIT}" && exit 0
;;
The debian postgresql-11 package makes use of templated systemd units. The main one is called postgresql.service but this is a dummy service that doesn't actually do anything. The PostgreSQL server is actually started by a templated unit named postgresql#11-main which is usually started alongside the main service because it has ReloadPropagatedFrom=postgresql.service.
Note that when this issue occurs, the main unit is started but the templated one is not:
$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-04-02 05:40:48 UTC; 32min ago
Main PID: 1663 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4665)
Memory: 0B
CGroup: /system.slice/postgresql.service
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Starting PostgreSQL RDBMS...
Apr 02 05:40:48 hubnext-west-r21r systemd[1]: Started PostgreSQL RDBMS.
$ sudo systemctl status postgresql#11-main
● postgresql#11-main.service - PostgreSQL Cluster 11-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead)
That's because when --job-mode=ignore-dependencies is specified, this link is ignored.
The GcE startup script runs as a systemd unit, which starts before multi-user.target is up:
$ find /etc/systemd | grep startup
/etc/systemd/system/multi-user.target.wants/google-startup-scripts.service
Therefore, invoke-rc.d notices that systemctl --quiet is-active multi-user.target is false and adds --job-mode=ignore-dependencies, which results in the PostgreSQL server not starting.
One possible workaround is explicitly running systemd start postgresql#11-main.service from your startup script after installing postgres.
By the way, I noticed that a recent commit (Nov 2020) changed this invoke-rc.d behavior so that it no longer uses --job-mode=ignore-dependencies. That would help avoid this issue.

How to debug a failed systemctl service (code=exited, status=217/USER)?

I'm trying to add my first service on rhel7 (which resides in AWS/EC2), but - the service is not configured correctly - as I get:
[ec2-user#ip-172-30-1-96 ~]$ systemctl status clouddirectd.service -l
● clouddirectd.service - CloudDirect Daemon
Loaded: loaded (/usr/lib/systemd/system/clouddirectd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2018-01-09 16:09:42 EST; 8s ago
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: Unit clouddirectd.service entered failed state.
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service failed.
Also:
[ec2-user#ip-172-30-1-96 ~]$ systemctl is-active clouddirectd
activating
[ec2-user#ip-172-30-1-96 ~]$ sudo systemctl list-units --type service --all | grep clouddirectd
clouddirectd.service loaded activating auto-restart CloudDirect Daemon
And my unit file is:
[ec2-user#ip-172-30-1-96 ~]$ cat /usr/lib/systemd/system/clouddirectd.service
[Unit]
Description=CloudDirect Daemon
After=network.target
[Service]
Environment=AWS_SHARED_CREDENTIALS_FILE=/etc/sonar/.aws/credentials
#ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/sonar/clouddirectd -c /etc/sonar/clouddirect/clouddirectd.conf
ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/clouddirect -c /etc/sonar/clouddirect.conf
# #PERM# allow group write permission on newly created files
UMask=0007
#User=clouddirectd
User=clouddirect
Group=sonar
KillSignal=SIGINT
TimeoutStopSec=60min
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Can you suggest how to debug this systemctl service so it won't keep dying and auto restarting?
The error 217 indicate the user did not exist at the time the service tried to start. In your case the user specified in your service is clouddirect.
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
This could be caused if that is not the actual user name (for example if it has a typo), it can also be caused if the user is part of some external user store (ex: LDAP or Active Directory) and the service which needs to start that allows the Linux server to access the external user store is not up yet. For example vasd.service starts a product used to allow Linux to authenticate against Active Directory, if vasd.service is not up and you have specified a user that is only available in Active Directory you would want to add that service in your After= line. For example:
After=network.target vasd.service
There's two parts to the question. One is how to diagnose a 217/USER, the other is how to fix it. I'll just focus on the former.
For the 217/USER there's some good pointers here:
https://www.reddit.com/r/linuxquestions/comments/oaya49/systemd_service_not_starting_with_status217/
217 doesnt' "always" mean it's a user problem, it just means it exited with a 217. May or may not...
You could use journalctl to see the logs of which services "seem to come up after it does" initially or what not.
It's possible that "network users" aren't yet available at the time the system is started during boot, you can fix that by adding After=nss-user-lookup.target https://systemd.io/UIDS-GIDS/ though that's not the case here since it still fails after restarting, which is later. systemd expects the user specified to "be available" when the service starts. So for "system users" (which start early running processes) they need to be available on the local box. For later started processes they can be "network users".
You could also try changing your group and username (and environment) to what you "think" systemd is running and run it manually, see what happens. https://serverfault.com/questions/410577/execute-a-command-from-another-group
Kind of wish systemd output more debug so you could tell what it is running more easily...
In certain bizarre cases you may need to specify both User= and Group= https://superuser.com/a/1452367/39364
In our case running "vintela status" had a message "SELinux may not be configured correctly" and sure enough, after disabling SELinux, it started working as expected, no more 217. [redhat 8]

How do I restart airflow webserver?

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command
airflow webserver -p 8080 -D True
Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works.
So How can I restart my daemon airflow webserver process in my server??
I advice running airflow in a robust way, with auto-recovery with systemd
so you can do:
- to start systemctl start airflow
- to stop systemctl stop airflow
- to restart systemctl restart airflow
For this you'll need a systemd 'unit' file.
As a (working) example you can use the following:
put it in /lib/systemd/system/airflow.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
PIDFile=/run/airflow/webserver.pid
EnvironmentFile=/home/airflow/airflow.env
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'export AIRFLOW_HOME=/home/airflow ; airflow webserver --pid /run/airflow/webserver.pid'
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=42s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
P.S: change AIRFLOW_HOME to where your airflow folder with the config
Can you check $AIRFLOW_HOME/airflow-webserver.pid for the process id of your webserver daemon?
Then pass it a kill signal to kill it
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
Then clear the pid file
cat /dev/null > $AIRFLOW_HOME/airflow-webserver.pid
Then just run
airflow webserver -p 8080 -D True
to restart the daemon.
This worked for me (multiple times! :D )
find the process id: (assuming 8080 is the port)
lsof -i tcp:8080
kill it
kill <pid>
Use Airflow webserver's (gunicorn) signal handling
Airflow uses gunicorn as it's HTTP server, so you can send it standard POSIX-style signals. A signal commonly used by daemons to restart is HUP.
You'll need to locate the pid file for the airflow webserver daemon in order to get the right process id to send the signal to. This file could be in $AIRFLOW_HOME or also /var/run, which is where you'll find a lot of pids.
Assuming the pid file is in /var/run, you could run the command:
cat /var/run/airflow-webserver.pid | xargs kill -HUP
gunicorn uses a preforking model, so it has master and worker processes. The HUP signal is sent to the master process, which performs these actions:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the preload_app option), Gunicorn will also load the new version of it.
More information in the gunicorn signal handling docs.
This is mostly an expanded version of captaincapsaicin's answer, but using HUP (SIGHUP) instead of KILL (SIGKILL) to reload the process instead of actually killing it and restarting it.
In my case i want to kill previous airflow process and start.
for that following command did the magic
killall -9 airflow
As the question was related to webserver, this is something that worked in my case:
systemctl restart airflow-webserver
Just run:
airflow webserver -p 8080 -D
Find pid with:
airflow webserver
will give: "The webserver is already running under PID 21250."
Than kill web server process with:
kill 21250
None of these worked for me. I had to delete the $AIRFLOW_HOME/airflow-webserver.pid file and then running airflow webserver worked.
Create a init script and use the command "daemon" to run this as service.
daemon --user="${USER}" --pidfile="${PID_FILE}" airflow webserver -p 8090 >> "${LOG_FILE}" 2>&1 &
The recommended approach is to create and enable the airflow webserver as a service. If you named the webserver as 'airflow-webserver', run the following command to restart the service:
systemctl restart airflow-webserver
You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Note: LightningFlow comes pre-integrated with all required libraries, Livy, custom operators, and local Spark cluster.
Link for AWS Marketplace: https://aws.amazon.com/marketplace/pp/Lightning-Analytics-Inc-LightningFlow-Integrated-o/B084BSD66V
Just by killing processes!!
Assuming the default airflow home directory is ~/airflow/
List the 3 parent processes running the airflow (PID):
cat ~/airflow/airflow-scheduler.pid
cat ~/airflow/airflow-webserver.pid
cat ~/airflow/airflow-webserver-monitor.pid
Get their PGID using:
ps -xjf
And finally run loop to kill all tree of each parent (PID):
for child in $(ps x -o "%P %p %r"| awk '{ if ( $1 == $your_first_PID || $3 == $your_first_PGID) { print $2 }}'); do kill $child; done
To restart Airflow you need to restart Airflow webserver and Airflow scheduler.
Check if Airflow servers are running:
ps -aux | grep airflow
if you see in list of running processes entries like:
ubuntu 49601 0.1 1.6 266668 135520 ? S 12:19 0:00 [ready] gunicorn: worker [airflow-webserver]
This means that Airflow webserver is running.
If you see entries like this:
ubuntu 49653 0.6 2.3 308912 187596 ? S 12:19 0:00 airflow scheduler -- DagFileProcessorManager
That means that Airflow scheduler is running.
Stop Airflow servers (webserver and scheduler):
pkill -f "airflow scheduler"
pkill -f "airflow webserver"
Now use again ps -aux | grep airflow to check if they are really shut down.
Start Airflow servers in background (daemon):
airflow webserver -D
airflow scheduler -D