Mosquitto MQTT Broker Server Failed Reached Start-Request-Limit - amazon-web-services

I am unable to open Mosquitto beacause I have reached the start/restart limit over a certain period of time:
$ systemctl status mosquitto.service
● mosquitto.service - Mosquitto MQTT v3.1/v3.1.1 Broker
Loaded: loaded (/usr/lib/systemd/system/mosquitto.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mosquitto.service.d
└─override.conf
Active: failed (Result: start-limit) since Fri 2023-02-03 21:45:41 UTC; 2h 12min ago
Docs: man:mosquitto.conf(5)
man:mosquitto(8)
Main PID: 543 (code=exited, status=1/FAILURE)
Feb 03 21:45:26 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: mosquitto.service: main process exited, ...RE
Feb 03 21:45:26 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: Failed to start Mosquitto MQTT v3.1/v3.1...r.
Feb 03 21:45:26 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: Unit mosquitto.service entered failed state.
Feb 03 21:45:26 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: mosquitto.service failed.
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: mosquitto.service holdoff time over, sch...t.
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: Stopped Mosquitto MQTT v3.1/v3.1.1 Broker.
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: start request repeated too quickly for m...ce
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: Failed to start Mosquitto MQTT v3.1/v3.1...r.
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: Unit mosquitto.service entered failed state.
Feb 03 21:45:41 ip-172-31-11-142.us-west-1.compute.internal systemd[1]: mosquitto.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
I have the method of:
ps -ef | grep mosquito
sudo kill ####
and it still does not work
I have also attempted to do:
systemctl edit /etc/mosquitto
[Service]
Restart=always
RestartSec=15
StartLimitInterval=150
StartLimitBurst=8

Related

Unable to start NFS server on GCE instance mounted on GCS

I want to setup a NFS server on GCP, so used a VM, mounted the GCS bucket on /vol using the gcsfuse. Then installed nfs-kernel-server packages on the VM, created a dir nfs_share under /vol, added the entry in /etc/exports and while restarting the nfs-kernel-server service, ran into the below error:
sudo systemctl status nfs-kernel-server
● nfs-server.service - NFS server and services
Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2022-06-04 15:28:54 UTC; 10s ago
Process: 2139 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
Process: 2138 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
Process: 2137 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=1/FAILURE)
Jun 04 15:28:54 g-non-prod-nfs-test-vm systemd[1]: Starting NFS server and services...
Jun 04 15:28:54 g-non-prod-nfs-test-vm exportfs[2137]: exportfs: /vol/nfs_share requires fsid= for NFS export
Jun 04 15:28:54 g-non-prod-nfs-test-vm systemd[1]: nfs-server.service: Control process exited, code=exited status=1
Jun 04 15:28:54 g-non-prod-nfs-test-vm systemd[1]: nfs-server.service: Failed with result 'exit-code'.
Jun 04 15:28:54 g-non-prod-nfs-test-vm systemd[1]: Stopped NFS server and services.
Filestore a service of GCP (NFS instance) needs to be deployed with 1TB storage as a minimum, thus looking for alternatives. Above approach looks feasible, but unable to get the nfs service up and running.

Error with Ops Agent GCE metadata unauthenticated

Trying to get Ops Agent working and used the following command to install it:
curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install
I see the following error in the logs of journalctl -u google-cloud-ops-agent-opentelemetry-collector -xn
otelopscol[2706]: 2022-02-06T21:50:36.140Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "googlecloud", "error": "[rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined; rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined; rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metadata: GCE metadata \"instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.read%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fmonitoring.write\" not defined]", "interval": "14.115202828s"}
The services otherwise look good and are running but the UI reports that Ops Agent wasn't actually running which I suspect is due to no data being sent back.
Here is the status of running agents:
google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor preset: enabled)
Active: active (running) since Sat 2022-02-05 04:38:41 UTC; 1 day 17h ago
Process: 2690 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-clou>
Main PID: 2706 (otelopscol)
Tasks: 9 (limit: 2369)
Memory: 193.6M
CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service
└─2706 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-o>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: /root/go/pkg/mod/go.opentelemetry.io/collector#v0.42.0/exporter/exporterhelper/qu>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).s>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: /root/go/pkg/mod/go.opentelemetry.io/collector#v0.42.0/exporter/exporterhelper/me>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
Feb 06 21:55:53 mongo-1 otelopscol[2706]: /root/go/pkg/mod/go.opentelemetry.io/collector#v0.42.0/exporter/exporterhelper/qu>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
Feb 06 21:55:53 mongo-1 otelopscol[2706]: /root/go/pkg/mod/go.opentelemetry.io/collector#v0.42.0/exporter/exporterhelper/in>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).Star>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: /root/go/pkg/mod/go.opentelemetry.io/collector#v0.42.0/exporter/exporterhelper/in>
Feb 06 21:55:53 mongo-1 otelopscol[2706]: 2022-02-06T21:55:53.145Z info exporterhelper/queued_retry.go:215 Exp>
● google-cloud-ops-agent.service - Google Cloud Ops Agent
Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
Active: active (exited) since Sat 2022-02-05 04:38:41 UTC; 1 day 17h ago
Process: 2691 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -in /etc/google-cloud-ops-agent/co>
Process: 2704 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 2704 (code=exited, status=0/SUCCESS)
Feb 05 04:38:41 mongo-1 systemd[1]: Starting Google Cloud Ops Agent...
Feb 05 04:38:41 mongo-1 systemd[1]: Finished Google Cloud Ops Agent.
● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging Agent
Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; vendor preset: enabled)
Active: active (running) since Sun 2022-02-06 15:05:35 UTC; 6h ago
Process: 22138 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/googl>
Main PID: 22144 (fluent-bit)
Tasks: 22 (limit: 2369)
Memory: 29.0M
CGroup: /system.slice/google-cloud-ops-agent-fluent-bit.service
└─22144 /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config /run/google-cloud-ops-agent-fluent-bi>
Feb 06 15:05:35 mongo-1 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Scheduled restart job, restart counter is at 7.
Feb 06 15:05:35 mongo-1 systemd[1]: Stopped Google Cloud Ops Agent - Logging Agent.
Feb 06 15:05:35 mongo-1 systemd[1]: Starting Google Cloud Ops Agent - Logging Agent...
Feb 06 15:05:35 mongo-1 systemd[1]: Started Google Cloud Ops Agent - Logging Agent.
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: Fluent Bit v1.8.12
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Copyright (C) 2019-2021 The Fluent Bit Authors
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Copyright (C) 2015-2018 Treasure Data
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Feb 06 15:05:35 mongo-1 fluent-bit[22144]: * https://fluentbit.io
The issue was due to the VM not having a service account. The solution was to do the following:
create a service account
assign the service account Logs Writer and Monitoring Metric Writer roles
Stop the VM, Edit the VM, set the newly created service account, start the VM
Note that by default a VM has a default service account. In my case I created the VM and explicitely didn't enable any service account hence the issue.

Failed to start Zabbix Agent for every 10 seconds

I am using centos 7.
How did I check the log.
journalctl -xe
What I got from the log.(I saw the same log every 10 seconds.)
Oct 02 10:19:51 lp01.localdomain systemd[1]: zabbix-agent.service holdoff time over, scheduling restart.
Oct 02 10:19:51 lp01.localdomain systemd[1]: Starting Zabbix Agent...
-- Subject: Unit zabbix-agent.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit zabbix-agent.service has begun starting up.
Oct 02 10:19:51 lp01.localdomain zabbix_agentd[8985]: zabbix_agentd [8987]: cannot open "/var/log/zabbix/zabbix_agentd.log": [13] Permission denied
Oct 02 10:19:51 lp01.localdomain systemd[1]: PID file /run/zabbix/zabbix_agentd.pid not readable (yet?) after start.
Oct 02 10:19:51 lp01.localdomain systemd[1]: zabbix-agent.service never wrote its PID file. Failing.
Oct 02 10:19:51 lp01.localdomain systemd[1]: Failed to start Zabbix Agent.
-- Subject: Unit zabbix-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit zabbix-agent.service has failed.
--
-- The result is failed.
Oct 02 10:19:51 lp01.localdomain systemd[1]: Unit zabbix-agent.service entered failed state.
Oct 02 10:19:51 lp01.localdomain systemd[1]: zabbix-agent.service failed.
So I checked "/var/log/zabbix/zabbix_agentd.log" file first.
ll /var/log/zabbix/zabbix_agentd.log
But it said No such file or directory.
ls: cannot access /var/log/zabbix/zabbix_agentd.log: No such file or directory
and then I checked "/run/zabbix/zabbix_agentd.pid" file.
ll /run/zabbix/zabbix_agentd.pid
It also said No such file or directory.
ls: cannot access /run/zabbix/zabbix_agentd.pid: No such file or directory
You have new mail in /var/spool/mail/root
I checked if Selinux is running.
getenforce
and it said Selinux is Disabled..
My questions are
How can I start zabbix?
If I can't start zabbix, can I stop zabbix from starting-failing every 10 seconds?
Thank you.
add permission to the directory - /var/log/zabbix/ & /var/log/zabbix-agent/
chmod 707 /var/log/zabbix/
chmod 707 /var/log/zabbix-agent/
or
change owner of the directory?
chown zabbix:zabbix /var/log/zabbix/
chown zabbix:zabbix /var/log/zabbix-agent/
And then, would stop service zabbix?
systemctl stop zabbix-agent

NGINX failed to work while configuring server for subdomain

Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.
Prior to this prompt, the server was working fine, but when I tried to config the server for subdomain it failed to work with this error.
Also for detailed error ...
[ec2-user#ip--------- conf.d]$ systemctl status nginx.service
? nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-07-07 06:03:04 UTC; 4min 17s ago
Process: 71445 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)
Process: 71444 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
Main PID: 70298 (code=exited, status=0/SUCCESS)
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: Starting The nginx HTTP and reverse proxy serv>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal nginx[71445]: nginx: [emerg] unexpected end of file, expec>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal nginx[71445]: nginx: configuration file /etc/nginx/nginx.c>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: nginx.service: Control process exited, code=ex>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: Failed to start The nginx HTTP and reverse pro>
lines 1-13/13 (END)
You can always check configuration errors by using this command :
nginx -T
before launch a catastrophic restart :)
You have an error unexpected end of file, check the syntax of your NGINX configuration as it is likely that you have a broken file.
The likely candidate is forgetting to close a } character.

Docker-Machine provisioned aws instance can not start docker engine

When I start a EC2 Instance with
docker-machine create --driver amazonec2 --amazonec2-region eu-central-1 --amazonec2-instance-type t2.2xlarge aws-test
docker-machine can create the VM, exchange the certs but the start of the engine fails.
Log in the EC2:
ubuntu#aws-manager2:~$ systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: e
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: inactive (dead) (Result: exit-code) since Thu 2017-06-29 09:18:44 UTC
Docs: https://docs.docker.com
Process: 5263 ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:/
Main PID: 5263 (code=exited, status=1/FAILURE)
Jun 29 09:18:44 aws-manager2 systemd[1]: Failed to start Docker Application Cont
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Unit entered failed sta
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Failed with result 'exi
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Service hold-off time o
Jun 29 09:18:44 aws-manager2 systemd[1]: Stopped Docker Application Container En
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Start request repeated
Jun 29 09:18:44 aws-manager2 systemd[1]: Failed to start Docker Application Cont
lines 1-16/16 (END)...skipping...
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: inactive (dead) (Result: exit-code) since Thu 2017-06-29 09:18:44 UTC; 1min 53s ago
Docs: https://docs.docker.com
Process: 5263 ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --s
Main PID: 5263 (code=exited, status=1/FAILURE)
Jun 29 09:18:44 aws-manager2 systemd[1]: Failed to start Docker Application Container Engine.
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Unit entered failed state.
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 29 09:18:44 aws-manager2 systemd[1]: Stopped Docker Application Container Engine.
Jun 29 09:18:44 aws-manager2 systemd[1]: docker.service: Start request repeated too quickly.
Jun 29 09:18:44 aws-manager2 systemd[1]: Failed to start Docker Application Container Engine.
Log at startup:
$ docker-machine create --driver amazonec2 --amazonec2-region eu-central-1 --amazonec2-instance-type t2.2xlarge aws-manager2
Running pre-create checks...
Creating machine...
(aws-manager2) Launching instance...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err : exit status 1
output : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
Yesterday it worked with the same configuration. Could there be a change in the used AIM since yesterday? I tried it from a different host but get also the same error.
It seems that there is a bug in the Docker Version that was rollout yesterday. A Workaround for us:
docker-machine create --driver amazonec2 --engine-install-url=https://web.archive.org/web/20170623081500/https://get.docker.com