Wordpress running on EC2 t3.small becomes unavailable (ELB Error 504) after X amount of time, needs rebooting - amazon-web-services

I have a problem with my Amazon EC2 instance (that did not happened when I was using DigitalOcean).
I've several EC2 instances that are managed by me. My personal EC2 has about 5 Wordpress sites running on a t2.micro instance and the traffic is not high so it is working well in load speed.
Also I have another 2 instances for one of my clients, one t2.micro (running only one Wordpress site) and a t3a.micro (running 4 Wordpress sites). The issue is with all 3 instances (mine and all the 2 of my client).
I have a CloudWatch alarm to notify me by email when Error 504 happen. Since I get the alarm, the website becomes unavailable (Cloudflare shows me Error 504), but I can get into SSH or Webmin. I do service nginx status and all seems to be fine, same to service php7.2-fpm. I do pkill nginx && pkill php* and then service nginx start && service php7.2-fpm start correctly but when I try to enter to the site, the Error 504 is still there.
To test, I decided to install and configure Apache with and without PHP-FPM enabled, same problem. Instance going well and websites running fast but after X amount of hours, it becomes unaccessible via web and the only solution is rebooting...
What's the only thing that solve the issue? Well, rebooting the instance.... After it boots, the websites are available again. Please note that I moved from DigitalOcean to AWS because it is more useful but I can't understand why the problem is happening here and not there since I've a similar instance configured very similar...
In all of the instances I've a setup with:
OS: Ubuntu 18.04
Types: Two t2.micro and one t3a.micro
ELB: Enabled
Security Groups: only allow ports 80, 443 from all the sources.
Database: In a RDS, not on the same instance.
I can provide the logs of everything that you probably can ask but I review all the Nginx and PHP-fpm logs and I can't see any anomalies. Also with syslog and kern.log, but I can provide if it can helps.
Hope you can give me a hand. Thanks for your advice!
EDIT:
I already found the origin of the issue. The problem wasn't in the EC2, all my headache was because I have the RDS set with only one Security Group attached to allow access from my IP to remote management of the databases and the public IPs of the EC2 that runs Wordpress, but I figured that I also need to whitelist the private IPs of those EC2s... Really noob mistake but that was the solution.

Related

EC2 server losses internet connection and application fails to send email, sms and even yum updates

I have 5 EC2 servers in the same VPC and all of a sudden yesterday, all of my applications started failing to send email and sms. So I tried doing git pull of my project it also timed out. Then tried to install telnet using yum that to failed with Time out. I have checked almost everything including Network ACLs, Security Groups, Subnets, Iptables, etc and everything is correct. I am not sure why is this happening.
The weird thing is if I reboot the server once the internet comes for a brief amount of time and again it disconnects.
Attaching below are the errors I am facing:
Error while Generating the Tiny URL. Error: {"errno":-110,"code":"ETIMEDOUT","syscall":"connect","address":"XXX.XX.XXX.XX","port":443}
Error SendEmail UnknownEndpoint: Inaccessible host: `email.ap-south-1.amazonaws.com'. This service may not be available in the `ap-south-1' region.
Attaching screenshots of my Network ACLs, Security Groups, Subnets, and iptables:
Please help with what am I doing wrong or if is this an issue with AWS EC2? My goal is to make sure my application works without timeout and git and yum starts working.
Did you try terminating and reprovisioning the instances, rather than rebooting them? There may be some problem with the underlying hardware. When you terminate and recreate an instance, it will likely end up in a different rack in the datacenter, which may solve the problem.
If the above helps, you should consider setting up an application load balancer with an auto scaling group, with health checks enabled for both, so that the auto scaling group terminates unhealthy instances and replaces then with the new ones automatically.
You may also consider using Simple Notification Service and stop worrying about underlying compute for e-mail and sms distribution altogether!

HTTP server on EC2 instance unreachable after a few minutes

I have a running instance on the Linux 2 AMI.
I have a default VPC and network interface.
Security groups taken care of, even opened all traffic and still got nothing.
There is an Internet Gateway
Routes are open on the VPC
The server is running
nginx is running
Once the instance is initiated and installed, all of this is ready
I can reach the http website the first 2-3 minutes, then it is unreacheable.
No idea why, everything else still running, can still ssh into the server, but http port 80 not running.
I opened everything from iptables, still nothing.
If I reboot the server, I get a minute where I can reach the server via http, but then a minute later its the same again.
I can reach http if I use $ wget http://localhost
So I think it is probably something from the EC2 control panel, not the instance itself.
I tried on new instances too.
Anyone has an idea?
The reason behind this weird behavior was that AWS abuse team had blocked some of my ports, had to upgrade to the developer plan to be able to know this, contacting them at the moment

Service not responding to ping command

My service (myservice.com) which is hosted in EC2 is up and running. I could see java process running within the machine but not able to reach the service from external machines. Tried the following option,
dns +short myservice.com
ping myservice.com
(1) is resolving and giving me ip address. ping is causing 100% packet loss. Not able to reach the service.
Not sure where to look at. Some help to debug would be helpful.
EDIT:
I had an issue with previous deployment due to which service was not starting - which I've fixed and tried to update - but the deployment was blocked due to ongoing deployment (which might take ~3hrs to stabilize). So I tried enabling Force deployment option from the console
Also tried minimising the "Number of Tasks" count to 0 and reverted it back to 1 (Reference: How do I deploy updated Docker images to Amazon ECS tasks?) to stop the ongoing deployment.
Can that be an issue?
You probably need to allow ICMP protocol in the security group.
See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-rules-reference.html#sg-rules-ping

AWS EC2 web server no access after a while

I've followed the guide to set up a LAMP WordPress site with EC2 on a Linux machine. I've attached a Elastic IP to it and configured the Security Group to allow HTTP access. It works great to access globally. However, after about one hour, I can't access the site any more. The EC2 health checks are fine, I can SSH to the server and the httpd, MySQL etc is up and running. When I reboot the EC2, everything starts to work again.
I've tried to troubleshoot for a while but don't understand what I'm doing wrong. Any ideas?
I have experienced the same with t2.micro with Apache and MySQL running on 8 GB Storage. Reasons is the memory consumption and log file filled the storage. This might be the reason here.

AWS EC2 is running but website is showing connection time out

I am running Bitnami WordPress on AWS server website working since two days but suddenly it stop showing anything and connection timeout is showing. The instance EC2 is running perfectly fine, and I have also seen IP logs, and nothing suspicious has come up.
Based on the above comments I guess the issue is with the internal web server
Make sure that the web server is running perfectly fine. And I do not mean just checking the EC2 instance state, because it is possible that the EC2 instance is running but the web server is down, causing the issue