Postgresql database server keeps shutting down randomly [closed]

Postgresql database server keeps shutting down randomly [closed] - django

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
During last two days, it's been five or six times which my postgres database server was shut down unexpectedly, often when server traffic was at the lowest level.
So i checked postgresql log:
2021-09-18 10:17:36.099 GMT [22856] LOG: received smart shutdown request
2021-09-18 10:17:36.111 GMT [22856] LOG: background worker "logical replication launcher" (PID 22863) exited with exit code 1
grep: Trailing backslash
kill: (28): Operation not permitted
2021-09-18 10:17:39.601 GMT [55614] XXX#XXX FATAL: the database system is shutting down
2021-09-18 10:17:39.603 GMT [55622] XXX#XXX FATAL: the database system is shutting down
2021-09-18 10:17:39.686 GMT [55635] XXX#XXX FATAL: the database system is shutting down
2021-09-18 10:17:39.688 GMT [55636] XXX#XXX FATAL: the database system is shutting down
2021-09-18 10:17:39.718 GMT [55642] XXX#XXX FATAL: the database system is shutting down
2021-09-18 10:17:39.720 GMT [55643] XXX#XXX FATAL: the database system is shutting down
kill: (55736): No such process
kill: (55741): No such process
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Failed to stop c3pool_miner.service: Interactive authentication required.
See system logs and 'systemctl status c3pool_miner.service' for details.
pkill: killing pid 654 failed: Operation not permitted
pkill: killing pid 717 failed: Operation not permitted
pkill: killing pid 717 failed: Operation not permitted
log_rot: no process found
chattr: No such file or directory while trying to stat /etc/ld.so.preload
rm: cannot remove '/opt/atlassian/confluence/bin/1.sh': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/1.sh.1': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/1.sh.2': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/1.sh.3': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/3.sh': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/3.sh.1': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/3.sh.2': No such file or directory
rm: cannot remove '/opt/atlassian/confluence/bin/3.sh.3': No such file or directory
rm: cannot remove '/var/tmp/lib': No such file or directory
rm: cannot remove '/var/tmp/.lib': No such file or directory
chattr: No such file or directory while trying to stat /tmp/lok
chmod: cannot access '/tmp/lok': No such file or directory
bash: line 525: docker: command not found
bash: line 526: docker: command not found
bash: line 527: docker: command not found
bash: line 528: docker: command not found
bash: line 529: docker: command not found
bash: line 530: docker: command not found
bash: line 531: docker: command not found
bash: line 532: docker: command not found
bash: line 533: docker: command not found
bash: line 534: docker: command not found
bash: line 547: setenforce: command not found
bash: line 548: /etc/selinux/config: Permission denied
Failed to stop apparmor.service: Interactive authentication required.
See system logs and 'systemctl status apparmor.service' for details.
Synchronizing state of apparmor.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable apparmor
Failed to reload daemon: Interactive authentication required.
update-rc.d: error: Permission denied
Failed to stop aliyun.service.service: Interactive authentication required.
See system logs and 'systemctl status aliyun.service.service' for details.
Failed to disable unit: Interactive authentication required.
/tmp/kinsing is 648effa354b3cbaad87b45f48d59c616
2021-09-18 10:17:49.860 GMT [54832] admin#postgres FATAL: terminating connection due to administrator command
2021-09-18 10:17:49.860 GMT [54832] admin#postgres CONTEXT: COPY uegplqsl, line 1: "/tmp/kinsing exists"
2021-09-18 10:17:49.860 GMT [54832] admin#postgres STATEMENT: DROP TABLE IF EXISTS XXX;CREATE TABLE XXX(cmd_output text);COPY XXXFROM PROGRAM 'echo ... |base64 -d|bash';SELECT * FROM XXX;DROP TABLE IF EXISTS XXX;
2021-09-18 10:17:49.877 GMT [22858] LOG: shutting down
2021-09-18 10:17:49.907 GMT [22856] LOG: database system is shut down
I learned it could be another process sending SIGTERM, SIGINT or SIGQUIT signals to database server. So i used systemtap to catch any signal for shutting down database server. After postgresql shut down again, i got this:
Now i have the PID of these processes which are sending shut down signals. What can i do to prevent this from happening again?
VPS operating system is Ubuntu 20.04.3 LTS. The backend is written in Django and database is Postgresql 12.

You have been hacked. Rebuild the system, and this time pick a good password for your superuser, and don't let anyone log on from the outside at all unless that is necessary, and if it is don't let them do so as the superuser.

Related

Systemd service should log to journal but not to rsyslog (/var/log/messages)

Why is the log data also displayed in the file "/var/log/messages" if you specify StandardOutput=null and StandardError=journal in the systemd service? I'm using Centos7 as the operating system.
[Service]
Restart=always
TimeoutStartSec=1200
StandardOutput=null
StandardError=journal
I can see both messages in journal and in the file /var/log/messages.
journalctl -u my_service
----
systemd[1]: my_service.service holdoff time over, scheduling restart.
----
cat /var/log/messages | grep my_service
----
systemd[1]: my_service.service holdoff time over, scheduling restart.
----
What additional adjustments do I have to make so that the service only logs its error messages in the journal?
EDIT:
I use the default configuration for journal "/etc/systemd/journald.conf". All lines are commented out.

apache2 libapache2-mod-wsgi-py3 django

Previously my code was working with apache2, django and libapache2-mod-wsgi. But I had to use python3, hence I removed libapache2-mod-wsgi and installed libapache2-mod-wsgi-py3. Now I am getting an error when I restart apache2.
Below is the error from command systemctl status apache2.service. I don't know why WSGIScriptAlias is not working for libapache2-mod-wsgi-py3.
The apache2 configtest failed.
Output of config test was:
AH00526: Syntax error on line 7 of /etc/apache2/sites-enabled/000-default.conf:
Invalid command 'WSGIScriptAlias', perhaps misspelled or defined by a module not
Action 'configtest' failed.
The Apache error log may have more information.
apache2.service: Control process exited, code=exited status=1
Failed to start LSB: Apache2 web server.
apache2.service: Unit entered failed state.
apache2.service: Failed with result 'exit-code'.

How to debug a failed systemctl service (code=exited, status=217/USER)?

I'm trying to add my first service on rhel7 (which resides in AWS/EC2), but - the service is not configured correctly - as I get:
[ec2-user#ip-172-30-1-96 ~]$ systemctl status clouddirectd.service -l
● clouddirectd.service - CloudDirect Daemon
Loaded: loaded (/usr/lib/systemd/system/clouddirectd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2018-01-09 16:09:42 EST; 8s ago
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: Unit clouddirectd.service entered failed state.
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service failed.
Also:
[ec2-user#ip-172-30-1-96 ~]$ systemctl is-active clouddirectd
activating
[ec2-user#ip-172-30-1-96 ~]$ sudo systemctl list-units --type service --all | grep clouddirectd
clouddirectd.service loaded activating auto-restart CloudDirect Daemon
And my unit file is:
[ec2-user#ip-172-30-1-96 ~]$ cat /usr/lib/systemd/system/clouddirectd.service
[Unit]
Description=CloudDirect Daemon
After=network.target
[Service]
Environment=AWS_SHARED_CREDENTIALS_FILE=/etc/sonar/.aws/credentials
#ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/sonar/clouddirectd -c /etc/sonar/clouddirect/clouddirectd.conf
ExecStart=/usr/lib/sonar/clouddirect/virtualenv/bin/python /usr/bin/clouddirect -c /etc/sonar/clouddirect.conf
# #PERM# allow group write permission on newly created files
UMask=0007
#User=clouddirectd
User=clouddirect
Group=sonar
KillSignal=SIGINT
TimeoutStopSec=60min
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Can you suggest how to debug this systemctl service so it won't keep dying and auto restarting?

The error 217 indicate the user did not exist at the time the service tried to start. In your case the user specified in your service is clouddirect.
Main PID: 10064 (code=exited, status=217/USER)
Jan 09 16:09:42 ip-172-30-1-96.us-west-1.compute.internal systemd[1]: clouddirectd.service: main process exited, code=exited, status=217/USER
This could be caused if that is not the actual user name (for example if it has a typo), it can also be caused if the user is part of some external user store (ex: LDAP or Active Directory) and the service which needs to start that allows the Linux server to access the external user store is not up yet. For example vasd.service starts a product used to allow Linux to authenticate against Active Directory, if vasd.service is not up and you have specified a user that is only available in Active Directory you would want to add that service in your After= line. For example:
After=network.target vasd.service

There's two parts to the question. One is how to diagnose a 217/USER, the other is how to fix it. I'll just focus on the former.
For the 217/USER there's some good pointers here:
https://www.reddit.com/r/linuxquestions/comments/oaya49/systemd_service_not_starting_with_status217/
217 doesnt' "always" mean it's a user problem, it just means it exited with a 217. May or may not...
You could use journalctl to see the logs of which services "seem to come up after it does" initially or what not.
It's possible that "network users" aren't yet available at the time the system is started during boot, you can fix that by adding After=nss-user-lookup.target https://systemd.io/UIDS-GIDS/ though that's not the case here since it still fails after restarting, which is later. systemd expects the user specified to "be available" when the service starts. So for "system users" (which start early running processes) they need to be available on the local box. For later started processes they can be "network users".
You could also try changing your group and username (and environment) to what you "think" systemd is running and run it manually, see what happens. https://serverfault.com/questions/410577/execute-a-command-from-another-group
Kind of wish systemd output more debug so you could tell what it is running more easily...
In certain bizarre cases you may need to specify both User= and Group= https://superuser.com/a/1452367/39364
In our case running "vintela status" had a message "SELinux may not be configured correctly" and sure enough, after disabling SELinux, it started working as expected, no more 217. [redhat 8]

Not able to access HDFS

I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable

You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop

Can't start mysqld/mysql

first I have to say I'm a mysql newbie.
Basically mysql does not start and says:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysql.sock' (2)
Now Here are the steps with which I ruined everything:
Due to it was not possible to log into my system (otrs) I thought a restore of an older backup would help.
While the restore process the backup manager said I need to drop the old db. I tried it but the process did not finish and so I cancelled it.
After this I tried to reboot the system but had huge problems with this but when I finished the reboot I tried to run mysql but it said it could not find the mysql.socket.
At this point I thought it would be better to reinstall mysql and did so but this did not help.
When trying to start mysqld as mysql user it said:
[ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
trying both did not help:
mysql:/root> /usr/sbin/mysqld --tc-heuristic-recover commit
131213 16:46:00 InnoDB: The InnoDB memory heap is disabled
131213 16:46:00 InnoDB: Mutexes and rw_locks use GCC atomic builtins
131213 16:46:00 InnoDB: Compressed tables use zlib 1.2.7
131213 16:46:00 InnoDB: Using Linux native AIO
131213 16:46:00 InnoDB: Initializing buffer pool, size = 128.0M
131213 16:46:00 InnoDB: Completed initialization of buffer pool
131213 16:46:00 InnoDB: highest supported file format is Barracuda.
131213 16:46:01 InnoDB: Waiting for the background threads to start
131213 16:46:02 Percona XtraDB (http://www.percona.com) 5.5.33-MariaDB-31.1 started; log sequence number 3710898915
131213 16:46:02 [Note] Server socket created on IP: '0.0.0.0'.
131213 16:46:02 [ERROR] Event Scheduler: Failed to open table mysql.event
131213 16:46:02 [ERROR] Event Scheduler: Error while loading from disk.
131213 16:46:02 [Note] Event Scheduler: Purging the queue. 0 events
131213 16:46:02 [ERROR] Aborting
131213 16:46:02 InnoDB: Starting shutdown...
131213 16:46:03 InnoDB: Shutdown completed; log sequence number 3710898915
131213 16:46:03 [Note] /usr/sbin/mysqld: Shutdown complete
Running systemctl start mysql.service fails anytime:
mysql.service - LSB: Start the MySQL database server
Loaded: loaded (/etc/init.d/mysql)
Active: failed (Result: timeout) since Fri, 13 Dec 2013 16:27:12 +0100; 23min ago
Process: 8845 ExecStart=/etc/init.d/mysql start (code=killed, signal=TERM)
CGroup: name=systemd:/system/mysql.service
Dec 13 16:31:21 mysql[8845]: otrs.user_preferences OK
Dec 13 16:31:21 mysql[8845]: otrs.users OK
Dec 13 16:31:21 mysql[8845]: otrs.valid OK
Dec 13 16:31:21 mysql[8845]: otrs.virtual_fs OK
Dec 13 16:31:21 mysql[8845]: otrs.virtual_fs_db OK
Dec 13 16:31:21 mysql[8845]: otrs.virtual_fs_preferences OK
Dec 13 16:31:21 mysql[8845]: otrs.web_upload_cache OK
Dec 13 16:31:21 mysql[8845]: otrs.xml_storage OK
Dec 13 16:31:21 mysql[8845]: performance_schema
Dec 13 16:31:21 mysql[8845]: Phase 3/3: Running 'mysql_fix_privilege_tables'...
I have absolutely no clue what to do. Could anyone help me?
How can the otrs tables been droped/deleted without using the mysql DROP command?
Would this help anyway?
Thank you.

mysqld --tc-heuristic-recover=ROLLBACK
Didn't quite do the magic for me. However the following worked
mysqld_safe --tc-heuristic-recover=COMMIT

I was able to overcome this issue on CentOS 6 with
service mysql start --tc-heuristic-recover=ROLLBACK
which ultimately discarded the commit in question. I'm not sure if systemd scripts support passing additional parameters. Maybe you could try and run it manually with the =
mysqld --tc-heuristic-recover=ROLLBACK

On mysql 5.6.x, and because of this off-by-one https://bugs.mysql.com/bug.php?id=70860 bug, I was able to get past this by doing
sudo service mysql start --tc-heuristic-recover=0
which presumably commits the tx.

Answer from ->> https://www.youtube.com/watch?v=qr-t8ksYO78
go to my.cnf file, note that you will find multiple my.cnf file, i had to look at all of them to find this->
# The MySQL server
[mysqld]
user = mysql
port=3306
socket = /opt/lampp/var/mysql/mysql.sock
Copy the socket path and writ it like ->
mysql -u root -p --socket=/opt/lampp/var/mysql/mysql.sock
Thanks

If any of the above doesn't work for you.
Edit mysql config file by doing the below:
sudo nano /etc/mysql/my.cnf and add tc-heuristic-recover=rollback under [mysqld]
Try to start mysql/mariadb server by
sudo systemctl start mysqld.service and it should fail with this error Can't init tc log.
Don't worry, just edit the config file again and comment tc-heuristic-recover=rollback
Try to start the mysql server again with sudo systemctl start mysqld.service and it should work fine.
Check the status to confirm systemctl status mysqld.service

You can try this command that I use Centos 7 that is work:
#mysqld_safe --wsrep-recover --tc-heuristic-recover=ROLLBACK
and then restart a service again.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js