Not able to save crash dump using kdump - amazon-web-services

I have a VPS server on amazon's AWS Lightsail service.
I've been testing kdump using the following two commands (to trigger an automatic kernel crash):
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger
The problem is that the system crashed and rebooted, but there's no dump saved.
Here is a list of checking I've done:
[centos#server crash]$ systemctl status kdump
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
Active: active (exited) since Mon 2019-03-18 07:43:34 UTC; 5 days ago
Process: 4119 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 4119 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/kdump.service
Mar 18 07:43:32 ip-.ap-northeast-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Mar 18 07:43:34 ip-.ap-northeast-1.compute.internal kdumpctl[4119]: kexec: loaded kdump kernel
Mar 18 07:43:34 ip-.ap-northeast-1.compute.internal kdumpctl[4119]: Starting kdump: [OK]
Mar 18 07:43:34 ip-.ap-northeast-1.compute.internal systemd[1]: Started Crash recovery kernel arming.
[centos#server crash]$ dmesg | grep Reserving
[ 0.000000] Reserving 256MB of memory at 368MB for crashkernel (System RAM: 2047MB)
[centos#server crash]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.1.3.el7.x86_64 root=UUID=f41e390f-835b-4223-a9bb-9b45984ddf8d ro console=tty0 crashkernel=256M console=ttyS0,115200
[centos#server crash]$ grep -v ^# /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
default reboot
There's no log of the crash in the /var/log/messages indicating any error there might be. So I wonder what I might have missed. Or an AWS Lightsail VPS is not capable of saving a kdump at all...?

Related

Apache 2.4 Status number of total request are increasing instead of going down

Thanks in advance.
I am having trouble with number of ever increasing Total Requests/child process of Apache.
I am currently using Apache 2.4/ PHP 7.4 on EC2 server with Centos 7.
Also, uses prefork-MPM.
The server log reached the maximum of space in EBS, it couldn't process any more incoming request for 20 minutes until I found out that problem. What I did was restart apache server
service httpd restart
It worked fine for about 1 hour or so.
And then Apahce server again went downtime.
At that point,
httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2022-07-04 14:08:42 KST; 2h 56min ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 5899 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 28298 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 5949 (httpd)
Status: "Total requests: 54982; Current requests/sec: 0; Current traffic: 0 B/sec"
I think it reached maximum of total requests which was 54,982 requests.
I restarted server again using
service httpd restart
However, total number of requests kept increasing every hour from about one thousand request to now almost 30,000 total requests. Currently I can't figure out what is going on my server and how to fix problem.
Later, while I researched around my server setting, I found the following line in the error.log which recommend me to increase MaxRequestWorker setting; however, I can't find any where in conf directory where I can increase MaxRequestWorker at all.
So, I like to know what is the way to reduce total requests in Apache if there is one. Another question is how to increase MaxRequestWorker in the setting.
the folling is current status of Apache
httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2022-07-04 17:07:35 KST; 1h 42min ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 29339 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 28298 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 29568 (httpd)
Status: "Total requests: 38027; Current requests/sec: 16.8; Current traffic: 172KB/sec"
CGroup: /system.slice/httpd.service
├─ 1374 /usr/sbin/httpd -DFOREGROUND
├─ 3056 /usr/sbin/httpd -DFOREGROUND
├─ 3213 /usr/sbin/httpd -DFOREGROUND
├─ 3783 /usr/sbin/httpd -DFOREGROUND
Jul 04 17:07:35 ip-.compute.internal systemd[1]: Starting The Apache HTTP Server...
Jul 04 17:07:35 ip-compute.internal httpd[29568]: AH00112: Warning: DocumentRoot [/home/source/tictoccroc/test] does not exist
Jul 04 17:07:35 ip-.compute.internal systemd[1]: Started The Apache HTTP Server.

Alert manager in prometheus give exit code error and ignoring assignment for alert manager in prometheus

I am new to prometheus, while trying to install alert manager export tool in prometheus, I got the following error after checking with systemctl status alertmanager
alertmanager.service - AlertManager Service
Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2021-05-01 11:23:07 UTC; 21s ago
Process: 51547 ExecStart=/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml -web.external-url=http://0.0.0.0:9093 (code=exited, status=1/>
Main PID: 51547 (code=exited, status=1/FAILURE)
May 01 11:23:07 STEP-Test systemd[1]: Started AlertManager Service.
May 01 11:23:07 STEP-Test alertmanager[51547]: alertmanager: error: unknown short flag '-w', try --help
May 01 11:23:07 STEP-Test systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
May 01 11:23:07 STEP-Test systemd[1]: alertmanager.service: Failed with result 'exit-code'.
I have tried removing and reinstalling but it was same. I checked my configuration to see, but I can't figure out the problem.
Configuration file is
Thank you all for your prompt response.
You've an error in your flags for alertmanager.
It appears, you should use --web.external-url rather than -web.external-url

Failed to start Redis In-Memory Data Store. Ubuntu 18.04

I am trying to install redis on my AWS server. I have Ubuntu 18.04 installed on it. I am following steps to install redis from digitalocean article.
When i run sudo systemctl status redis command i am getting below error.
screenshot
I tried to edit /etc/systemd/system/redis.service file and added Type=forking under [Service] section but still getting the same error.
Can anyone suggest me how i can get it fixed?
Thanks in advance.
Based on same digitalocean tutorial, actually it's running fine.
Run this command sudo systemctl restart redis.service, we get (showing "failed" on last line):
● redis.service - Redis In-Memory Data Store
Loaded: loaded (/etc/systemd/system/redis.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2021-06-28 12:03:11 +03; 1min 0s ago
Process: 20428 ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf (code=exited, status=
Main PID: 20428 (code=exited, status=203/EXEC)
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Service hold-off time over, scheduling restar
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Scheduled restart job, restart counter is at
Jun 28 12:03:11 XYZ systemd[1]: Stopped Redis In-Memory Data Store.
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Start request repeated too quickly.
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Failed with result 'exit-code'.
Jun 28 12:03:11 XYZ systemd[1]: Failed to start Redis In-Memory Data Store.
But if you run sudo service redis-server status, we get (showing "running" on 3rd line):
● redis-server.service - Advanced key-value store
Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-06-28 11:50:13 +03; 19min ago
Docs: http://redis.io/documentation,
man:redis-server(1)
Process: 19278 ExecStop=/bin/kill -s TERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 19371 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCC
Main PID: 19382 (redis-server)
Tasks: 4 (limit: 4915)
CGroup: /system.slice/redis-server.service
└─19382 /usr/bin/redis-server 127.0.0.1:6379
Jun 28 11:50:13 XYZ systemd[1]: Starting Advanced key-value store...
Jun 28 11:50:13 XYZ systemd[1]: redis-server.service: Can't open PID file /var/run/redis/red
Jun 28 11:50:13 XYZ systemd[1]: Started Advanced key-value store.
After searching for hours, it seems like it's some difference between systemctl & service and nothing more, but the actual redis server is running fine. Corrects me if that's not the case. Here's the link: https://askubuntu.com/questions/903354/difference-between-systemctl-and-service-commands
You can even check if redis is working fine, by redis-cli ping, should print PONG
I also encountered this problem, then I tried to check it again.
Finally, I found that when I authorized /var/lib/redis, I entered the wrong command, causing the redis account to have no access to /var/lib/redis.
sudo chown redis:redis /var/lib/redis
sudo systemctl restart redis
succeeded.

docker-machine create with digitalocean driver: ssh command error

I´m using docker tools on windows.
create command was working perfectly last week and I managed to create a number of machines on Digital Ocean. Then I tried today with no success. I repeated the same command with different regions and I always get the same result:
λ docker-machine create -d digitalocean --digitalocean-access-token=MYTOKEN --digitalocean-region=ams2 vmname
Running pre-create checks...
Creating machine...
(fernu) Creating SSH key...
(fernu) Creating Digital Ocean droplet...
(fernu) Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err : exit status 1
output : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
If I execute the suggested command:
root#fernu:~# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: inactive (dead) (Result: exit-code) since Fri 2017-06-30 20:56:13 UTC; 8min ago
Docs: https://docs.docker.com
Process: 4943 ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=digitalocean (code=exited, status=1/FAILURE)
Main PID: 4943 (code=exited, status=1/FAILURE)
Jun 30 20:56:13 fernu systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 20:56:13 fernu systemd[1]: Failed to start Docker Application Container Engine.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Unit entered failed state.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 30 20:56:13 fernu systemd[1]: Stopped Docker Application Container Engine.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Start request repeated too quickly.
Jun 30 20:56:13 fernu systemd[1]: Failed to start Docker Application Container Engine.
Any help would be appreciated
Update
It´s working with ubuntu 14:
--digitalocean-image=ubuntu-14-04-x64 so it seams like a problem with the default image (ubuntu-16-04-x64)
This seems to be hitting a lot of people. TL;DR: There is a bug in docker-machine v0.12.0 and this issue can be resolved by upgrading.
Logging in to the DigitalOcean instance and running journalctl -xe provides more information:
-- Unit docker.service has begun starting up.
Jul 07 20:03:52 docker-sandbox docker[4930]: `docker daemon` is not supported on Linux. Please run `do
Jul 07 20:03:52 docker-sandbox systemd[1]: docker.service: Main process exited, code=exited, status=1/
Jul 07 20:03:52 docker-sandbox systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
The key here is docker daemon is not supported on Linux. A bug in docker-machine's version comparison code caused an incorrect systemd unit file to be produced (located at /etc/systemd/system/docker.service.d/10-machine.conf) on certain versions of Ubuntu.
A fix has been committed and a new release (v0.12.1) was made.
You can grab the latest release at: https://github.com/docker/machine/releases/tag/v0.12.1

Glusterfs-server can't stop in good status

Glusterfs-server can't stop in good status.
I have been trying the following steps to install glusterfs and start service.
yum install centos-release-gluster
yum install glusterfs-server
systemctl start glusterd
and stop it.
systemctl stop glusterd
Then displayed following status "Active: failed".
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled)
Active: failed (Result: exit-code) since 火 2017-01-24 18:23:55 JST; 4s ago
Process: 2523 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2524 (code=exited, status=15)
1月 24 18:23:52 ds009 systemd[1]: Started GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Stopping GlusterFS, a clustered file-system server...
1月 24 18:23:55 ds009 systemd[1]: glusterd.service: main process exited, code=exited, status=15/n/a
1月 24 18:23:55 ds009 systemd[1]: Stopped GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Unit glusterd.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.
Using environment is"CentOS Linux release 7.2.1511 (Core)"
And installed glusterfs-server version is 3.8.8-1.el7.
Does anyone have an idea what wrong is and to fix this.
I have found a workaround.
Edit /usr/lib/systemd/system/glusterd.service file as following.
[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS
ExecStopPost=/usr/bin/systemctl reset-failed glusterd # Add this
KillMode=process
The failed status is clear when the GlusterFS service stop.