zabbix agent service failed, PID not readable

zabbix agent service failed, PID not readable - centos7

I'm trying to run zabbix-agent 3.0.4 on CentOS7, systemd failed to start the zabbix agent, from journalctl -xe
PID file /run/zabbix/zabbix_agentd.pid not readable (yes?) after start.
node=localhost.localdomain type=SERVICE_START msg=audit(1475848200.601:17994): pid=1 uid=0 auid=4294967298 ses=...
zabbix-agent.service never wrote its PID file. Failing.
Failed to start Zabbix Agent.
There is no permission error, and I try to re-configure the PID path to /tmp folder in zabbix-agent.service and zabbix_agentd.conf, it doesn't work.
Very weird, anyone has idea? Thank you in advance.
=====
Investigating a little bit, the PID should be under /run/zabbix folder, I create manually the zabbix_agentd.pid, and it disappears after 1 second. Really weird.

I had the same issue and it was related to selinux. So I allowed zabbix_agent_t via semanage
yum install policycoreutils-python
semanage permissive -a zabbix_agent_t

Giving the full permissions 7777 to that pid file will help to resolve the issue.

i had this too and it was Selinux, it was disabled but i had to
run the command

That's work for me.
Prerequisites: Centos 7, zabbix-server 3.4 and zabbix-agent 3.4 runing on same host.
Solution steps:
Install zabbix-server and zabbix-agent (no matter how - via yum or building from source code).
Check first if there is already separate users exist in /etc/passwd. If there is already zabbix users exist go to p.4.
Create separate groups and users for zabbix-server and zabbix-agent.
Example (you can specify usernames on your desire):
groupadd zabbix-agent
useradd -g zabbix-agent zabbix-agent
groupadd zabbix
useradd -g zabbix zabbix
Specify PID and LOG file location in Zabbix config files. Example:
For zabbix-server: in /etc/zabbix/zabbix_server.conf:
PidFile=/run/zabbix/zabbix_server.pid
LogFile=/var/log/zabbix/zabbix_server.log
For zabbix-agent: in /etc/zabbix/zabbix_agentd.conf:
PidFile=/run/zabbix-agent/zabbix-agent.pid
LogFile=/var/log/zabbix-agent/zabbix-agent.log
Create appropriate directories (if they haven't been creatred previously) as were specified in config files and change owners for this directories:
mkdir /var/log/zabbix-agent
mkdir /run/zabbix-agent
chown zabbix-agent:zabbix-agent /var/log/zabbix-agent
chown zabbix-agent:zabbix-agent /run/zabbix-agent
mkdir /var/log/zabbix
mkdir /run/zabbix
chown zabbix:zabbix /var/log/zabbix-agent
chown zabbix:zabbix /run/zabbix-agent
Check systemd config for zabbix services and add Username= and Group= in [Service] section under which services will run. Example:
For zabbix-server: /etc/systemd/system/multi-user.target.wants/zabbix-server.service:
[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_server.conf"
EnvironmentFile=-/etc/sysconfig/zabbix-server
Type=forking
Restart=on-failure
PIDFile=/run/zabbix/zabbix_server.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_server -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
TimeoutSec=0
User=zabbix
Group=zabbix
[Install]
WantedBy=multi-user.target
For zabbix-agent: /etc/systemd/system/multi-user.target.wants/zabbix-agent.service:
[Unit]
Description=Zabbix Agent
After=syslog.target
After=network.target
[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_agentd.conf"
EnvironmentFile=-/etc/sysconfig/zabbix-agent
Type=forking
Restart=on-failure
PIDFile=/run/zabbix-agent/zabbix-agent.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_agentd -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
User=zabbix-agent
Group=zabbix-agent
[Install]
WantedBy=multi-user.target
If there is no such configs - you can find them in:
/usr/lib/systemd/system/
OR
Enable zabbix-agent.service service and thereby create symlink in /etc/systemd/system/multi-user.target.wants/ directory to /usr/lib/systemd/system/zabbix-agent.service
Run services:
systemctl start zabbix-server
systemctl start zabbix-agent
Check users under which services had been started (first column):
ps -aux | grep zabbix
or via top command.

Disable SELinux and Firewalld and you're good to go

Related

System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down

I am trying to activate service after creating a systemd service using the following commands in google cloud terminal:
vim /etc/systemd/system/app.service
Pasted the contents below to this file:
#vim /etc/systemd/system/app.service
[Unit]
# specifies metadata and dependencies
Description=Gunicorn instance to serve myproject
After=network.target
# tells the init system to only start this after the networking target has been reached
# We will give our regular user account ownership of the process since it owns all of the relevant files
[Service]
# Service specify the user and group under which our process will run.
User=clashgamers2021
# give group ownership to the www-data group so that Nginx can communicate easily with the Gunicorn processes.
Group=www-data
# We'll then map out the working directory and set the PATH environmental variable so that the init system knows where our the executables for the process are located (wi$
WorkingDirectory=/home/clashgamers2021/clashgamers/
Environment="PATH=/home/clashgamers2021/clashgamers/env/bin"
# We'll then specify the commanded to start the service
ExecStart=/home/clashgamers2021/clashgamers/env/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 wsgi:app
# This will tell systemd what to link this service to if we enable it to start at boot. We want this service to start when the regular multi-user system is up and running:
[Install]
WantedBy=multi-user.target
For activating this service, I typed:
sudo systemctl start app
sudo systemctl enable app
However I got this error:
clashgamers2021#cloudshell:~/clashgamers (clash-gamers-318206)$ sudo systemctl start app
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

You're trying to run the commands in the Cloud Shell:
Cloud Shell is an interactive shell environment for Google Cloud that makes it easy for you to learn and experiment with Google Cloud and manage your projects and resources from your web browser.
Create a new VM (specify hardware & OS) and connect to it using SSH button in the Cloud Console or use other methods described in the documentation.
Then run your commands and if they don't work update your question with more details.

gunicorn: Is there a better way to reload gunicorn in described situation?

I have a django project with gunicorn and nginx.
I'm deploying this project with saltstack
In this project, I have a config.ini file that django views read.
In case of nginx, I made that if nginx.conf changes, a state cmd.run service nginx restart with - onchanges - file: nginx_conf restarts the service.
but in case of gunicorn, I can detect the change of config.ini, but I don't know how to reload the gunicorn.
when gunicorn starts, I gave an option --reload but does this option detects change of config.ini not only django project's files'?
If not, what command should I use? (ex: gunicorn reload) ??
thank you.
ps. I saw kill -HUP pid but I think salt wouldn't knows gunicorn's pid..

The --reload option looks for changes to the source code not config. And --reload shouldn't be used in production anyway.
I would either:
1) Tell gunicorn to write a pid file with --pid /path/to/pid/file and then get salt to kill the pid followed by a restart.
2) Get salt to run a pkill gunicorn followed by a restart.

Don't run shell commands to manage services, use service states.
/path/to/nginx.conf:
file.managed:
# ...
/path/to/config.ini:
file.managed:
# ...
nginx:
service.running:
- enabled: true
- watch:
- file: /path/to/nginx.conf
django-app:
service.running:
- enabled: true
- reload: true
- watch:
- file: /path/to/config.ini
You may need to create a service definition for gunicorn yourself. Here's a very basic systemd example:
[Unit]
Description=My django app
After=network.target
[Service]
Type=notify
User=www-data
Group=www-data
WorkingDirectory=/path/to/source
ExecStart=/path/to/venv/bin/python gunicorn project.wsgi:application
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
[Install]
WantedBy=multi-user.target

Recommendation For Templated .service/.conf File in RPM Spec

We have created RPMs from .spec files that include installation of services for both upstart and systemd (hence, both .conf and .service) files. But the .conf and .service files contain hardwired paths to the service files.
myservice.service:
[Unit]
Description=My Service
[Service]
WorkingDirectory=/opt/myproduct
ExecStart=/opt/myproduct/myservice /opt/myproduct/myservicearg
Restart=always
[Install]
WantedBy=multi-user.target
myservice.conf:
description "My Service"
respawn
respawn limit 15 5
start on (stopped rc and runlevel [2345])
stop on runlevel [06]
chdir /opt/myproduct
exec /opt/myproduct/myservice /opt/myproduct/myservicearg
The installation paths are likely to change, but the brute force search-and-replace seems stone-age.
I have used Ansible with .j2 (Jinja2) template files, which seems like a nice way to use a variable for the binary/script paths. Using them might look something like this:
myservice.service.j2:
[Unit]
Description=My Service
[Service]
WorkingDirectory={{ myproductpath }}
ExecStart={{ myproductpath }}/myservice {{ myproductpath }}/myservicearg
Restart=always
[Install]
WantedBy=multi-user.target
myservice.conf.j2:
description "My Service"
respawn
respawn limit 15 5
start on (stopped rc and runlevel [2345])
stop on runlevel [06]
chdir {{ myproductpath }}
exec {{ myproductpath }}/myservice {{ myproductpath }}/myservicearg
But I was unable to find anything that suggested this is a common approach for building RPMs. Is there a recommended way in RPMs to template these .conf and .service files, either filled in during RPM build or during install?

No. Rpm does not have any such templating tool. Most developers prefer classical sed:
%build
....
sed -i 's/{{ myproductpath }}/\/real\/path/g' myservice.conf.j2
mv myservice.conf.j2 myservice.conf
Or you can BuildRequires: ansible and let ansible expand it. But that is a quite heavy tool for this job.

How do I restart airflow webserver?

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command
airflow webserver -p 8080 -D True
Server running successfully in backend. Now I want to enable authentication in airflow and done configuration changes in airflow.cfg, but authentication functionality is not reflected in server. when I stop and start airflow server in my local machine it works.
So How can I restart my daemon airflow webserver process in my server??

I advice running airflow in a robust way, with auto-recovery with systemd
so you can do:
- to start systemctl start airflow
- to stop systemctl stop airflow
- to restart systemctl restart airflow
For this you'll need a systemd 'unit' file.
As a (working) example you can use the following:
put it in /lib/systemd/system/airflow.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
PIDFile=/run/airflow/webserver.pid
EnvironmentFile=/home/airflow/airflow.env
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'export AIRFLOW_HOME=/home/airflow ; airflow webserver --pid /run/airflow/webserver.pid'
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=42s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
P.S: change AIRFLOW_HOME to where your airflow folder with the config

Can you check $AIRFLOW_HOME/airflow-webserver.pid for the process id of your webserver daemon?
Then pass it a kill signal to kill it
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
Then clear the pid file
cat /dev/null > $AIRFLOW_HOME/airflow-webserver.pid
Then just run
airflow webserver -p 8080 -D True
to restart the daemon.

This worked for me (multiple times! :D )
find the process id: (assuming 8080 is the port)
lsof -i tcp:8080
kill it
kill <pid>

Use Airflow webserver's (gunicorn) signal handling
Airflow uses gunicorn as it's HTTP server, so you can send it standard POSIX-style signals. A signal commonly used by daemons to restart is HUP.
You'll need to locate the pid file for the airflow webserver daemon in order to get the right process id to send the signal to. This file could be in $AIRFLOW_HOME or also /var/run, which is where you'll find a lot of pids.
Assuming the pid file is in /var/run, you could run the command:
cat /var/run/airflow-webserver.pid | xargs kill -HUP
gunicorn uses a preforking model, so it has master and worker processes. The HUP signal is sent to the master process, which performs these actions:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the preload_app option), Gunicorn will also load the new version of it.
More information in the gunicorn signal handling docs.
This is mostly an expanded version of captaincapsaicin's answer, but using HUP (SIGHUP) instead of KILL (SIGKILL) to reload the process instead of actually killing it and restarting it.

In my case i want to kill previous airflow process and start.
for that following command did the magic
killall -9 airflow

As the question was related to webserver, this is something that worked in my case:
systemctl restart airflow-webserver

Just run:
airflow webserver -p 8080 -D

Find pid with:
airflow webserver
will give: "The webserver is already running under PID 21250."
Than kill web server process with:
kill 21250

None of these worked for me. I had to delete the $AIRFLOW_HOME/airflow-webserver.pid file and then running airflow webserver worked.

Create a init script and use the command "daemon" to run this as service.
daemon --user="${USER}" --pidfile="${PID_FILE}" airflow webserver -p 8090 >> "${LOG_FILE}" 2>&1 &

The recommended approach is to create and enable the airflow webserver as a service. If you named the webserver as 'airflow-webserver', run the following command to restart the service:
systemctl restart airflow-webserver
You can use a ready-made AMI (namely, LightningFLow) from AWS Marketplace which provides Airflow services (webserver, scheduler, worker) which are enabled at startup.
Note: LightningFlow comes pre-integrated with all required libraries, Livy, custom operators, and local Spark cluster.
Link for AWS Marketplace: https://aws.amazon.com/marketplace/pp/Lightning-Analytics-Inc-LightningFlow-Integrated-o/B084BSD66V

Just by killing processes!!
Assuming the default airflow home directory is ~/airflow/
List the 3 parent processes running the airflow (PID):
cat ~/airflow/airflow-scheduler.pid
cat ~/airflow/airflow-webserver.pid
cat ~/airflow/airflow-webserver-monitor.pid
Get their PGID using:
ps -xjf
And finally run loop to kill all tree of each parent (PID):
for child in $(ps x -o "%P %p %r"| awk '{ if ( $1 == $your_first_PID || $3 == $your_first_PGID) { print $2 }}'); do kill $child; done

To restart Airflow you need to restart Airflow webserver and Airflow scheduler.
Check if Airflow servers are running:
ps -aux | grep airflow
if you see in list of running processes entries like:
ubuntu 49601 0.1 1.6 266668 135520 ? S 12:19 0:00 [ready] gunicorn: worker [airflow-webserver]
This means that Airflow webserver is running.
If you see entries like this:
ubuntu 49653 0.6 2.3 308912 187596 ? S 12:19 0:00 airflow scheduler -- DagFileProcessorManager
That means that Airflow scheduler is running.
Stop Airflow servers (webserver and scheduler):
pkill -f "airflow scheduler"
pkill -f "airflow webserver"
Now use again ps -aux | grep airflow to check if they are really shut down.
Start Airflow servers in background (daemon):
airflow webserver -D
airflow scheduler -D

Running Gunicorn behind chrooted nginx inside virtualenv

I can this setup to work if I start gunicorn manually or if I add gunicorn to my django installed apps. But when I try to start gunicorn with systemd the gunicorn socket and service start fine but they don't serve anything to Nginx; I get a 502 bad gateway.
Nginx is running under the "http" user/group, chroot jail. I used pythonbrew to setup the virtualenvs so gunicorn is installed in my home directory under .pythonbrew. The vitualenv directory is owned by my user and the adm group.
I'm pretty sure there is a permission issue somewhere, because everything works if I start gunicorn but not if systemd starts it. I've tried changing the user and group directives inside the gunicorn.service file, but nothing worked; if root start the server then I get no errors and a 502, if my user starts it I get no errors and 504.
I have checked the Nginx logs and there are no errors, so I'm sure it's a gunicorn issue. Should I have the virtualenv in the app directory? Who should be the owner of the app directory? How can I narrow down the issue?
/usr/lib/systemd/system/gunicorn-app.service
#!/bin/sh
[Unit]
Description=gunicorn-app
[Service]
ExecStart=/home/noel/.pythonbrew/venvs/Python-3.3.0/nlp/bin/gunicorn_django
User=http
Group=http
Restart=always
WorkingDirectory = /home/noel/.pythonbrew/venvs/Python-3.3.0/nlp/bin
[Install]
WantedBy=multi-user.target
/usr/lib/systemd/system/gunicorn-app.socket
[Unit]
Description=gunicorn-app socket
[Socket]
ListenStream=/run/unicorn.sock
ListenStream=0.0.0.0:9000
ListenStream=[::]:8000
[Install]
WantedBy=sockets.target
I realize this is kind of a sprawling question, but I'm sure I can pinpoint the issue with a few pointers. Thanks.
Update
I'm starting to narrow this down. When I run gunicorn manually and then run ps aux|grep gunicorn then I see two processes that are started: master and worker. But when I start gunicorn with systemd there is only one process started. I tried adding Type=forking to my gunicorn.services file, but then I get an error when loading service. I thought that maybe gunicorn wasn't running under the virtualenv or the venv isn't getting activated?
Does anyone know what I'm doing wrong here? Maybe gunicorn isn't running in the venv?

I had a similar problem on OSX with launchd.
The issue was I needed to allow for the process to spawn sub processes.
Try adding Type=forking:
[Unit]
Description=gunicorn-app
[Service]
Type=forking

I know this isn't the best way, but I was able to get it working by adding gunicorn to the list of django INSTALLED_APPS. Then I just created a new systemd service:
[Unit]
Description=hack way to start gunicorn and django
[Service]
User=http
Group=http
ExecStart=/srv/http/www/nlp.com/nlp/bin/python /srv/http/www/nlp.com/nlp/nlp/manage.py run_gunicorn
Restart=always
[Install]
WantedBy=multi-user.target
There must be a better way, but judging by the lack of responses not many people know what that better way is.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js