Amazon Linux 2 worker fails to reboot - amazon-web-services

I'm running a Node.js application on an Amazon Linux 2 worker instance, connected to SQS.
The problem
It all runs fine, except that for technical reasons I need to restart the server regularly. To do this, I've set up a cron to run /sbin/shutdown -r now at night.
As the instance boots back up, I get an error regarding the SQS daemon service:
[INFO] Executing instruction: configureSqsd
[INFO] get sqsd conf from cfn metadata and write into sqsd conf file ...
[INFO] Executing instruction: startSqsd
[INFO] Running command /bin/sh -c systemctl show -p PartOf sqsd.service
[INFO] Running command /bin/sh -c systemctl is-active sqsd.service
[INFO] Running command /bin/sh -c systemctl start sqsd.service
[ERROR] An error occurred during execution of command [self-startup] - [startSqsd].
Stop running the command. Error: startProcess Failure: starting process "sqsd" failed:
Command /bin/sh -c systemctl start sqsd.service failed with error exit status 1.
Stderr:Job for sqsd.service failed because the control process exited with error code.
See "systemctl status sqsd.service" and "journalctl -xe" for details.
The instance is then stuck in a loop where the initialization runs until it hits the sqsd.service error and then starts over again.
Logs
The systemctl status sqsd.service command doesn't appear to show much more information than we already got, only that it exited with status 1:
● sqsd.service - This is sqsd daemon
Loaded: loaded (/etc/systemd/system/sqsd.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) (Result: exit-code)
Process: 2748 ExecStopPost=/bin/sh -c (code=exited, status=0/SUCCESS)
Process: 2745 ExecStopPost=/bin/sh -c rm -f /var/pids/sqsd.pid (code=exited, status=0/SUCCESS)
Process: 2753 ExecStart=/bin/sh -c /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd start (code=exited, status=1/FAILURE)
CGroup: /system.slice/sqsd.service
└─2789 /opt/elasticbeanstalk/lib/ruby/bin/ruby /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd start
The most interesting found when checking journalctl -xe is:
sqsd[9704]: /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `initialize': No such file or directory # rb_sysopen - /var/run/aws-sqsd/default.pid (Errno::ENOENT)
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `open'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:58:in `start'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:83:in `launch'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.3/bin/aws-sqsd:111:in `<top (required)>'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `load'
sqsd[9704]: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `<main>'
systemd[1]: sqsd.service: control process exited, code=exited status=1
systemd[1]: Failed to start This is sqsd daemon.
Further investigation
As per the logs, the file /var/run/aws-sqsd/default.pid does not exist when rebooting the server. It does exist on a rebuild and contains the application process ID.
If I add the file, the setup process gets a little bit further until a similar file is missing.
Solutions?
Has anyone run into this issue before? Not sure why starting sqsd.service fails after a normal reboot but works fine on initial deploy and after rebuilding the environment... It almost seems like it's looking for a config file that doesn't exist...
Are there any other ways to safely reboot the instance that I should try?

I have the same exact issue. Not posting a solution but some more data on the issue. I found errors in /var/log/messages that suggest that the SQSd daemon ran out of memory.
Apr 28 15:43:05 ip-172-31-121-3 sqsd: /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:42:in `fork': Cannot allocate memory - fork(2) (Errno::ENOMEM)
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:42:in `start'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:83:in `launch'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/aws-sqsd-3.0.4/bin/aws-sqsd:111:in `<top (required)>'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `load'
Apr 28 15:43:05 ip-172-31-121-3 sqsd: from /opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd:23:in `<main>'
Apr 28 15:43:05 ip-172-31-121-3 systemd: sqsd.service: control process exited, code=exited status=1
Apr 28 15:43:05 ip-172-31-121-3 systemd: Failed to start This is sqsd daemon.
Apr 28 15:43:05 ip-172-31-121-3 systemd: Unit sqsd.service entered failed state.
Apr 28 15:43:05 ip-172-31-121-3 systemd: sqsd.service failed.
After setting up a larger instance class, it was going through fine but I'm not sure that it wasn't just the refreshed instance that did it (like david.emilsson mentioned) or the extra memory.

Related

NGINX failed to work while configuring server for subdomain

Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.
Prior to this prompt, the server was working fine, but when I tried to config the server for subdomain it failed to work with this error.
Also for detailed error ...
[ec2-user#ip--------- conf.d]$ systemctl status nginx.service
? nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-07-07 06:03:04 UTC; 4min 17s ago
Process: 71445 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)
Process: 71444 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
Main PID: 70298 (code=exited, status=0/SUCCESS)
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: Starting The nginx HTTP and reverse proxy serv>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal nginx[71445]: nginx: [emerg] unexpected end of file, expec>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal nginx[71445]: nginx: configuration file /etc/nginx/nginx.c>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: nginx.service: Control process exited, code=ex>
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 07 06:03:04 ip-999.ap-south-1.compute.internal systemd[1]: Failed to start The nginx HTTP and reverse pro>
lines 1-13/13 (END)
You can always check configuration errors by using this command :
nginx -T
before launch a catastrophic restart :)
You have an error unexpected end of file, check the syntax of your NGINX configuration as it is likely that you have a broken file.
The likely candidate is forgetting to close a } character.

Failed to start Redis In-Memory Data Store. Ubuntu 18.04

I am trying to install redis on my AWS server. I have Ubuntu 18.04 installed on it. I am following steps to install redis from digitalocean article.
When i run sudo systemctl status redis command i am getting below error.
screenshot
I tried to edit /etc/systemd/system/redis.service file and added Type=forking under [Service] section but still getting the same error.
Can anyone suggest me how i can get it fixed?
Thanks in advance.
Based on same digitalocean tutorial, actually it's running fine.
Run this command sudo systemctl restart redis.service, we get (showing "failed" on last line):
● redis.service - Redis In-Memory Data Store
Loaded: loaded (/etc/systemd/system/redis.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2021-06-28 12:03:11 +03; 1min 0s ago
Process: 20428 ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf (code=exited, status=
Main PID: 20428 (code=exited, status=203/EXEC)
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Service hold-off time over, scheduling restar
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Scheduled restart job, restart counter is at
Jun 28 12:03:11 XYZ systemd[1]: Stopped Redis In-Memory Data Store.
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Start request repeated too quickly.
Jun 28 12:03:11 XYZ systemd[1]: redis.service: Failed with result 'exit-code'.
Jun 28 12:03:11 XYZ systemd[1]: Failed to start Redis In-Memory Data Store.
But if you run sudo service redis-server status, we get (showing "running" on 3rd line):
● redis-server.service - Advanced key-value store
Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-06-28 11:50:13 +03; 19min ago
Docs: http://redis.io/documentation,
man:redis-server(1)
Process: 19278 ExecStop=/bin/kill -s TERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 19371 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCC
Main PID: 19382 (redis-server)
Tasks: 4 (limit: 4915)
CGroup: /system.slice/redis-server.service
└─19382 /usr/bin/redis-server 127.0.0.1:6379
Jun 28 11:50:13 XYZ systemd[1]: Starting Advanced key-value store...
Jun 28 11:50:13 XYZ systemd[1]: redis-server.service: Can't open PID file /var/run/redis/red
Jun 28 11:50:13 XYZ systemd[1]: Started Advanced key-value store.
After searching for hours, it seems like it's some difference between systemctl & service and nothing more, but the actual redis server is running fine. Corrects me if that's not the case. Here's the link: https://askubuntu.com/questions/903354/difference-between-systemctl-and-service-commands
You can even check if redis is working fine, by redis-cli ping, should print PONG
I also encountered this problem, then I tried to check it again.
Finally, I found that when I authorized /var/lib/redis, I entered the wrong command, causing the redis account to have no access to /var/lib/redis.
sudo chown redis:redis /var/lib/redis
sudo systemctl restart redis
succeeded.

docker-machine create with digitalocean driver: ssh command error

I´m using docker tools on windows.
create command was working perfectly last week and I managed to create a number of machines on Digital Ocean. Then I tried today with no success. I repeated the same command with different regions and I always get the same result:
λ docker-machine create -d digitalocean --digitalocean-access-token=MYTOKEN --digitalocean-region=ams2 vmname
Running pre-create checks...
Creating machine...
(fernu) Creating SSH key...
(fernu) Creating Digital Ocean droplet...
(fernu) Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Error creating machine: Error running provisioning: ssh command error:
command : sudo systemctl -f start docker
err : exit status 1
output : Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
If I execute the suggested command:
root#fernu:~# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: inactive (dead) (Result: exit-code) since Fri 2017-06-30 20:56:13 UTC; 8min ago
Docs: https://docs.docker.com
Process: 4943 ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=digitalocean (code=exited, status=1/FAILURE)
Main PID: 4943 (code=exited, status=1/FAILURE)
Jun 30 20:56:13 fernu systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 20:56:13 fernu systemd[1]: Failed to start Docker Application Container Engine.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Unit entered failed state.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 30 20:56:13 fernu systemd[1]: Stopped Docker Application Container Engine.
Jun 30 20:56:13 fernu systemd[1]: docker.service: Start request repeated too quickly.
Jun 30 20:56:13 fernu systemd[1]: Failed to start Docker Application Container Engine.
Any help would be appreciated
Update
It´s working with ubuntu 14:
--digitalocean-image=ubuntu-14-04-x64 so it seams like a problem with the default image (ubuntu-16-04-x64)
This seems to be hitting a lot of people. TL;DR: There is a bug in docker-machine v0.12.0 and this issue can be resolved by upgrading.
Logging in to the DigitalOcean instance and running journalctl -xe provides more information:
-- Unit docker.service has begun starting up.
Jul 07 20:03:52 docker-sandbox docker[4930]: `docker daemon` is not supported on Linux. Please run `do
Jul 07 20:03:52 docker-sandbox systemd[1]: docker.service: Main process exited, code=exited, status=1/
Jul 07 20:03:52 docker-sandbox systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
The key here is docker daemon is not supported on Linux. A bug in docker-machine's version comparison code caused an incorrect systemd unit file to be produced (located at /etc/systemd/system/docker.service.d/10-machine.conf) on certain versions of Ubuntu.
A fix has been committed and a new release (v0.12.1) was made.
You can grab the latest release at: https://github.com/docker/machine/releases/tag/v0.12.1

Glusterfs-server can't stop in good status

Glusterfs-server can't stop in good status.
I have been trying the following steps to install glusterfs and start service.
yum install centos-release-gluster
yum install glusterfs-server
systemctl start glusterd
and stop it.
systemctl stop glusterd
Then displayed following status "Active: failed".
glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled)
Active: failed (Result: exit-code) since 火 2017-01-24 18:23:55 JST; 4s ago
Process: 2523 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2524 (code=exited, status=15)
1月 24 18:23:52 ds009 systemd[1]: Started GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Stopping GlusterFS, a clustered file-system server...
1月 24 18:23:55 ds009 systemd[1]: glusterd.service: main process exited, code=exited, status=15/n/a
1月 24 18:23:55 ds009 systemd[1]: Stopped GlusterFS, a clustered file-system server.
1月 24 18:23:55 ds009 systemd[1]: Unit glusterd.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.
Using environment is"CentOS Linux release 7.2.1511 (Core)"
And installed glusterfs-server version is 3.8.8-1.el7.
Does anyone have an idea what wrong is and to fix this.
I have found a workaround.
Edit /usr/lib/systemd/system/glusterd.service file as following.
[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS
ExecStopPost=/usr/bin/systemctl reset-failed glusterd # Add this
KillMode=process
The failed status is clear when the GlusterFS service stop.

how to fix uwsgi services failure to start in centos 7

uwsgi.service - uWSGI Emperor service
Loaded: loaded (/etc/systemd/system/uwsgi.service; disabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Sun 2016-05-22 05:46:09 EDT; 5min ago
Process: 6371 ExecStartPre=/usr/bin/bash -c mkdir -p /run/uwsgi; chown user:nginx /run/uwsgi (code=exited, status=1/FAILURE)
May 22 05:46:09 apxyws systemd[1]: Failed to start uWSGI Emperor service.
May 22 05:46:09 apxyws systemd[1]: Unit uwsgi.service entered failed state.
May 22 05:46:09 apxyws systemd[1]: uwsgi.service failed.
May 22 05:46:09 apxyws systemd[1]: uwsgi.service holdoff time over, scheduling restart.
May 22 05:46:09 apxyws systemd[1]: start request repeated too quickly for uwsgi.service
May 22 05:46:09 apxyws systemd[1]: Failed to start uWSGI Emperor service.
May 22 05:46:09 apxyws systemd[1]: Unit uwsgi.service entered failed state.
May 22 05:46:09 apxyws systemd[1]: uwsgi.service failed.
Does anybody know how to fix something like this, i am still newbie for setup things like this, been searching the answer but nothing similiar to me.
i just started using django and it worked using:
uwsgi --http :8080 --home /root/Env/apxweb --chdir /root/apxweb -w apxweb.wsgi
but when i started using it with nginx,
uwsgi service failure to start.
notes: nginx service worked.
Thanks Luke Dixon,
Yes the problem is within user,
I update the file like this:
/usr/bin/bash -c mkdir -p /run/uwsgi; chown root:nginx /run/uwsgi
uwsgi worked again,
i dont know if this is the correct way to fix uwsgi,
but anyway thank you very much.