Django application high disk IOPS on EC2

Django application high disk IOPS on EC2 - django

I am having sudden increases of high Disk IOPS on my EC2 instances. They are all running a Django 1.9.6 web application on it. The apps installed on it are Apache, Celery, New Relic Agent and Django Wsgi itself.
The application does not do any disk operations as such. Data is stored on RDS and Redis (Another server). The static files are stored on S3 and connected to cloudfront. So I am unable to determine what is the cause of this high Disk IOPS.
What happens is a normal request suddenly takes forever to respond. On checking cloudwatch and new relic I see the RAM usage shoots up. Then the instance is unresponsive. All requests time out and can't SSH in. When I contacted AWS Support they said the VolumeQueueLength was increasing significantly and once it came down (15-20 mins later) the instance was working fine.
Any ideas as to what could be the issue?

Related

Why my website that is deployed in Google compute engine VM instance taking too long to load?

My website is taking too much time to load. I have hosted it in a VM instance created from Google compute engine.
My website is built on MERN stack and running with docker compose. I'm using docker scaling for backend application but no load balancing for the client side or the VM.
I would really appreciate if someone helps me out with this issue, I've been searching for days but still couldn't figure out what the issue is.
This is the site link:
https://www.mindschoolbd.com
VM type: e2-standard-2,
zone: nortamerica-norteast2-a

There are few things that we can do to make it load fast :
1)You can optimize your application code (including database queries if you have) - on this part, you can contact your developer as this is out of our scope as well.
Check there are a few situations that may cause the load issue in mobile apps:
i)The MERN Stack - A Practical guide app server may be down and that is causing the loading issue.
ii)Your wifi / mobile data connection is not working properly. Please check your data connection.
iii)Too many users are using the app at the same time. Please try after a few minutes.
Please go through the MERN Stack practical issues for more info.
2)You can upgrade your Machine type and this includes the CPU and RAM of your VM instance and server caching is also part of it.
Check over util of memory and cpu increase, if required change it into something higher than e2-standard-2, As recommended in the SO for more information.
3)Try to have your vm deployed in a region which is near to your location. Suggest to create snapshot and create vm using snapshot and change region.
4)Create a cloud CDN for caching to make it load fast, Cloud CDN lowers network latency & Content delivery best practices, offloads origins, and reduces serving costs. Recommending to use Optimize application latency with load balancing and also installing an Ops agent in your VM instance to have a better monitoring view.
Finally go through the Website Speed Optimization Tips: Techniques to Improve Performance and User Experience for more information.

Suitable Google Cloud Compute instance for Website Hosting

I am new to cloud computing, but want to use it to host a website I am building. The website will be a data analytics site, and each user will interacting with a MySQL database and reading data from text files. I want to be able to accommodate about 500 users at a time. The site will likely have around 1000-5000 users fully scaled. I have chosen GCP, and am wondering if the e2-standard-2 VM instance would be enough to get started. I will also be using a GCP HA MySQL server, I am thinking that 2 vCPU's and 5GB memory will be enough, with 50GB high availability SDD storage. Any suggestions would be appreciated? Also, is there anything other service I will need? Thank you!!

Your question is irrelevant. On Google Cloud Platform you have real time monitoring for CPU and RAM usage so you if your website is gaining more users you can just upgrade or downgrade your CPU or RAM with 2 or 3 mouse clicks. Start small and upgrade later if you see CPU or RAM is getting close to 100% usage. Start with a N1 chip micro instance 600MB RAM.

How to fix ec2 instance when I can't login via ssh after installing clamscan/clamav?

After installing clamscan clamav on ubuntu 18.04 aws ec2 instance I can't login to my aws server with ssh. Neither my website on that server shows up on browser. I have rebooted but not working. How do I fix this?

Common reasons for instances are Exhausted memory and Corrupted file system.
Since you are using t2.micro, which only has 1GB of ram, and by default 8GB disk, its possible that your instance is simply too small to run your workloads. In such a situation, a common solution is to upgrade it to, e.g. t2.medium (2GB of RAM), but such change will be outside of free-tier.
Alterantively, you can re-reinstall your application on new t2.micro, but this time setup CloudWatch Agent to monitor RAM and disk use. By default these things are not monitored. If you monitor them on a new instance, it can give your insights about how much ram, disk or other resources, are used by your applications.
The metrics collected in CloudWatch will help you judge better of what causing the freeze of your instance.

Amazon EC2 Servers getting frozen sporadically

I've been working with Amazon EC2 servers for 3+ years and I noticed a recurrent behaviour: some servers get frozen sporadically (between 1 to 5 times by year).
When this fact ocurs, I can't connect to server (tried http, mysql and ssh connections) till server be restarted.
The server back to work after a restart.
Sometimes the server goes online by 6+ months, sometimes the server get frozen about 1 month after a restart.
All servers I noticed this behavior were micro instances (North Virginia and Sao Paulo).
The servers have an ordinary Apache 2, Mysql 5, PHP 7 environment, with Ubuntu 16 or 18. The PHP/MySQL Web application is not CPU intensive and is not accessed by more than 30 users/hour.
The same environment and application on Digital Ocean servers does NOT reproduce the behaviour (I have two digital ocean servers running uninterrupted for 2+ years).
I like Amazon EC2 Servers, mainly because Amazon has a lot of useful additional services (like SES), but this behaviour is really frustating. Sometimes I got customers calls complaining about systems down and I just need an instance restart to solve the problem.
Does anybody have a tip about solving this problem?
UPDATE 1
They are t2.micro instances (1Gb RAM, 1 vCPU).
MySQL SHOW GLOBAL VARIABLES: pastebin.com/m65ieAAb
UPDATE 2
There is a CPU utilization peak in the logs, near the time when server was down. It was at 3AM. At this time there is a daily crontab task to make a database backup. But, considering this task runs everyday, why just sometimes it would make server get frozen?

I have not seen this exact issue, but on any cloud platform I assume any instance can fail at any time, so we design for failure. For example we have autoscaling on all customer facing instances. Anytime an instance fails, it is automatically replaced.
If a customer is calling to advise you a server is down, you may need to consider more automated methods of monitoring instance health and take automated action to recover the instance.
CloudWatch also has server recovery actions available that can be trigger if certain metric thresholds are reached.

Reducing the size of boot disk on google cloud instance with Plesk

i would like to resize the SSD boot disk of my Google Cloud instance from 500 GB to 150 GB. The instance has Ubuntu 16.04.5 LTS and Plesk Onyx installed and a web and mail server is running which is currently my biggest problem.
My idea is to create a new instance and add a mirror of the current disk as a boot disk on the new instance. But how do I mirror the disk without a downtime of the mail and web server? Or if I have to stop both services which is the best way to mirror the disk?
Any experiences? Or tips?
500GB SSD is more expensive than we thought, this is the reason why we have to reduce the disk size.
Thanks

To avoid the downtime, I can suggest the following action plan:
Deploy a new instance with the required parameters.
Perform a migration to the new instance. You can find documentation here, and while it may seem complex, when you have two instances with the same Plesk version and the same list of installed components, it is a pretty straightforward process.
When the migration is finished, switch routing from the public IP or IPs to a new instance.
Make sure that everything works fine and get rid of the overpriced instance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js