AWS ec2 instance becomes unresponsive after I/O heavy operation in dockerfile

AWS ec2 instance becomes unresponsive after I/O heavy operation in dockerfile - amazon-web-services

I'm using free tier ec2-instance (t2.micro) with EBS volume (default one). I use a docker container to host a website.
Occasionally, when I run out of storage, I have to bulk delete my dangling docker images. On performing delete and rebuilding the container ssh instance hangs (while installing the npm modules), and I'm not able to even log into my machine for almost 1-2 hours.
On research, I realized this has something to do with burst credits, but on inspecting my EBS burst credits I still have 60 credits left. And I have around 90 CPU credits.
Not sure why this unresponsiveness in happening, my instance even stops serving the website it's running after this for 1-2 hours.
For reference, this is my Dockerfile.

Related

EC2 goes down when db backup cron is running

We are using T3 small instance of EC2 running ubuntu 18.04. It is having one local Mongo service, some node services.
At 19:30 IST, the cron is scheduled to dump the Mongo database and store it in GCS cloud.
Sometimes this cron is causing the server to shut down, though not everytime.
I checked the cloudwatch metrics and read ops are around 10k+ at that time and burst balance also going down.
Is there any way to mitigate this?

If your Read Ops are high, you would probably also see a large number of queued tasks at the same time which will cause performance to drop.
The most costly solution would be to upgrade your disk to PIOPs and set the number of IOPs as the highest consumption point, but this will have a cost impact on your solution.
The following steps will help to mitigate performance:
Do not run the backup on the mongodb server, run it from a standalone server.
Add a standby replica and backup from that as to not affect the performance of your primary node.
Add pauses between backup actions, rather than everything at once can you backup over a longer period with gaps in between to improve the overall performance.

Django application high disk IOPS on EC2

I am having sudden increases of high Disk IOPS on my EC2 instances. They are all running a Django 1.9.6 web application on it. The apps installed on it are Apache, Celery, New Relic Agent and Django Wsgi itself.
The application does not do any disk operations as such. Data is stored on RDS and Redis (Another server). The static files are stored on S3 and connected to cloudfront. So I am unable to determine what is the cause of this high Disk IOPS.
What happens is a normal request suddenly takes forever to respond. On checking cloudwatch and new relic I see the RAM usage shoots up. Then the instance is unresponsive. All requests time out and can't SSH in. When I contacted AWS Support they said the VolumeQueueLength was increasing significantly and once it came down (15-20 mins later) the instance was working fine.
Any ideas as to what could be the issue?

AWS site down issue because cpu utilization reach 100%

I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl

If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html

If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type

Can I run NEO4J on a t2.micro AWS?

I had NEO4J running on a m3.medium instance, only to realizes that I was being charged for AWS usage. No issue here. Since I am experimenting at this time, I'd like to run NEO4J on the t2.micro instance. I followed instructions on AWS to resize to a t2.micro instance and now I cannot access the NEO4J server. My NEO4J stack is up and running, but I get a 503 service unavailable error.
What am I missing?

Neo4j should run fine on t2.micro. I have it even running on Raspberry PI for demo purposes. You just need to take care on setting right heap size and pagecache size. Maybe go with 512M for heap and 200M for pagecache, leaving up ~300 for the system.
If all memory is occupied, sshd cannot allocate memory for new connections.

How can I apply chef configuration without registering the node in the server?

I am programming some short-lived EC2 instances. They will exist for only a few hours to do a job every now and again but will require a very large EBS volume; to keep it around all the time would cost hundreds of dollars a month. Because EBS volumes are pro-rated, I can just allocate this volume when I need it and discard it after the job is complete so the cost will not be all that high (EBS volumes are billed hourly after all).
Unfortunately the elastic file store is not yet available in my region, and it's also in a preview mode at the moment so probably not suitable for production use yet anyway.
Anyway, that's really just background. What I'd like to do is is have my instance automatically configure itself when it is started using user data. I would like it to download a script from an S3 repository that instructs it to install chef-client and execute a chef-client run that will set up the node. It will then run another command which will kick off the job. Once that's complete, the AWS Data Pipeline will automatically terminate the instance.
The one point I don't like about the above is that when I register the node, the node will be registered in my Chef server. I'd like to just download the configuration for a specified role without actually registering anything. I'll never need to run the configuration again because the instance will be gone in a couple of hours once the job is complete.
I could of course script the entire setup and execution of the above using shell scripts but I'd rather tie it in with all the Chef infrastructure we've already built, which is integrated with our CI server and is fully source-controlled and so on.

You could use chef-provisioning and/or knife-zero.
They start a chef-zero in-memory server locally, then bootstrap a node which connects through the SSH connection back to your local chef-zero server. After the converge, the connection is shut down. It's much like rsync+chef-solo but on steroids.
See:
https://github.com/higanworks/knife-zero
https://github.com/chef/chef-provisioning
https://github.com/chef/chef-provisioning-aws

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js