AWS services down monitors - amazon-web-services

I am new to AWS monitors, so this might be a naive question. I was just checking if there are any monitors that will tell us if the AWS services are down? For example if ElasticSearch, or ElastiCache is down, Is there a way to create an alarm? Are there are any Metrics available for them? Thanks in advance !

There's a fairly new feature called AWS Personal Health Dashboard that allows you to track if your account is experiencing any service outages. You can setup notifications in there.

https://status.aws.amazon.com/
That link will get you pretty much all you need to see what is up and down, but yes you can also set up cloudwatch monitors and notifications to be alerted (or take some action), if specific problems come up.

Related

How to track unused resources in AWS?

I have been using AWS for a while now. I always have the difficulty tracking AWS resources and how they are interconnected. Obviously, I am using Terraform but still, there is always ad-hoc operations that cut down my visibility.
Since I have been charged multiple times for resources/services that are present but not used by me.
Unused services include resources that are not pointing to other services but present in the AWS environment.
Tools suggestions are also welcome.
Also, posted on DevOps. Posting here since there are fewer people there.
I have used Janitor Monkey, Cloud Custodian and we do have a bunch of AWS Config + Lambda for cleaning up.
Janitor Monkey determines whether a resource should be a cleanup
candidate by applying a set of rules on it. If any of the rules
determines that the resource is a cleanup candidate, Janitor Monkey
marks the resource and schedules a time to clean it up.
I think that a viable answer here is the same as the popular answer for when to auto-scale - use CloudWatch alarms.
Whenever you have a service that you need to auto-scale up, you do something like monitor for high CPU. If the CPU usage trips some threshold, the alarm can be configured to scale up your fleet. Correspondingly, if CPU usage goes below some threshold, the alarm can be configured to scale down the fleet. Similar alarms can be configured other alerts like memory, disk usage, etc.
So, instead of configuring CloudWatch alarms to scale up or scale down your fleet, you can just configure a CloudWatch alarm to email you when a host becomes idle (e.g. it's CPU usage is too low).
Similar to Janitor Monkey, I've created a tool to track different types of unused resources (ELB, EBS, AMI, Security groups, etc) : https://github.com/romibuzi/majordome

Manage multiple aws accounts

I would like to know a system by which I can keep track of multiple aws accounts, somewhere around 130+ accounts with each account containing around 200+ servers.
I wanna know methods to keep track of machine failure, service failure etc.
I also wanna know methods by which I can automatically turn up a machine if the underlying hardware failed or the machine terminated while on spot.
I'm open to all solutions including chef/terraform automation, healing scripts etc.
You guys will be saving me a lot of sleepless nights :)
Thanks in advance!!
This is purely my take on implementing your problem statement.
1) Well.. for managing and keeping track of multiple aws accounts you can use AWS Organization. This will help you manage centrally with one root account all the other 130+ accounts. You can enable consolidated billing as well.
2) As far as keeping track of failures... you may need to customize this according to your requirements. For example: You can build a micro service on top of docker containers or ecs whose sole purpose is to keep track of failures, generate a report and push to s3 on a daily basis.You can further create a dashboard using AWS quicksight out of this reports in S3.
There can be another micro service which will rectify the failures. It just depends on how exhaustive and fine grained you want your implementation to be.
3) For spawning instances when spot instances are terminated, it can be achieved through you simple autoscaling configurations. Here are some of the articles you may want to go through which will give you some ideas:
Using Spot Instances with On-Demand instances
Optimizing Spot Fleet+Docker with High Availability
AWS Organisations are useful for management. You can also look at multiple account billing strategy and security strategy. A shared services account with your IAM users will make things easier.
Regarding tracking failures you can set up automatic instance recovery using CloudWatch. CloudWatch can also have alerts defined that will email you when something happens you don't expect, though setting them up individually could be time consuming. At your scale I think you should look into third party tools.

AWS Cloudwatch Monitoring

Just wondering if the AWS cloudwatch runs on the same VPC where i have all my applications are running?
Is there any chance that AWS cloudwatch might go down and we may loose the monitoring capability?
Do we need to have a monitoring mechanism to check the Cloudwatch health?
Thanks
AWS Cloudwatch isn't run on your instances. Its infrastructure is fully managed by Amazon and independent from your VPC. You can see it as a SaaS (Software as a Service).
So you don't have to worry about that. For more informations, please see: https://aws.amazon.com/cloudwatch/
Cloudwatch collects data from the host OS, where your VMs are actually running.
If the physical server had a significant issue both cloudwatch and your VM would go down but in that case the VM would get started automatically on another physical server. In such a case, recovery would be usually quite quickly.
You don't need to check Cloudwatch at all because AWS handles that but you could add alerts for things such as CPU usage on your VMs.
Because Cloudwatch doesn't run on your machines it can't know some things such as memory usage, disk space usage or others so if you need more advanced monitoring capabilities you might consider running something like collectd inside your virtual machine.
Just wondering if the AWS cloudwatch runs on the same VPC where i have all my applications are running?
If you chose to install CloudWatch Agent on your EC2 then only it runs in your EC2 and thus in the VPC your EC2 is provisioned.
CloudWatch service that publishes/maintain logs, metrics, alarms etc is managed by AWS and runs outside your VPC.
CloudWatch has a SLA of 99.9%
https://aws.amazon.com/cloudwatch/sla/
Is there any chance that AWS cloudwatch might go down and we may loose the monitoring capability?
CloudWatch like any other service can have outages and it did have some in the past but I have never seen any data getting lost, only temporarily not being available or slow to retrieve during the outage.
Do we need to have a monitoring mechanism to check the Cloudwatch health?
SLA is already 99.9% for CloudWatch Service so chances of catching a blip is very rare on your own monitoring mechanism.
If you are using CloudWatch Agent then consider checking health of agent to make sure it is in running state (you can use AWS System Manager Run command).

How to stop a particular service in AWS

I have an AWS account, i have created to do some research. I have started some of its services. Now I want to cancel some services like EC2 and RDS. There is an option in Manage Account section for "Cancel Selected Service" but in its dropdown menu, there is nothing to select.
You can't cancel a service, to my knowledge. All you can do is to stop using it, which will mean zero charges on your credit card.
Just delete the instances you are not using any more. Not really sure what it means to cancel a service as billing is usage based.

AWS: How to disable all services?

I was dorking around with AWS (and related services), hoping that I could stay in the Free Tier, like I do when I'm exploring Google App Engine.
A few days ago, I get a letter from Amazon that they've charged me $33 or so for my 2 days of exploration.
This has got to end, but I forget what services I've enabled. Ideally, I'd just disable the AWS account entirely, as without a free sandbox there's no way I'm going to be using their service. Is there a global off button, or do I have to stumble around to turn all their services off individually? Or do I have to delete my CC information and just create a new Amazon account altogether?
You can close your entire account in AWS Billing: https://console.aws.amazon.com/billing/home?#/account
Or if you just want to disable your "Free-Tier" services that has charges, view them here:
https://console.aws.amazon.com/billing/home#/freetier
Then open your EC2 dashboard - and cancel those services:
https://us-west-2.console.aws.amazon.com/ec2
For example:
Stop running instances, delete volumes, remove elastic IPs, etc.
Otherwise, I recommend sending an email to webservices#amazon.com from the email you used to signup with their service.
I had an RDS running and I couldn't figure out how to cancel just that service
Here's how to do it:
Go to billing services
https://console.aws.amazon.com/billing/home?region=us-west-2#/
Click "Bill Details"
Inspect it
You'll find NAME OF SERVICE + ITS LOCATION. This is the information you need.
https://console.aws.amazon.com/rds/home?region=us-east-1
Go to topright of page. Select the correct server location
The rest is straightforward from here
I was also frustrated (by being charged on the free tier without any info/warning in prior) and found a simple and elegant solution to turn off all AWS services. You delete your account and forget about these fraudulent (to be honest) AWS services.
Here is the link:
https://console.aws.amazon.com/billing/home?#/account
Here is the section:
I know this is a somehow an old question, but I would like to add a new answer because I think AWS has changed a lot since this was asked. I have stumbled on a similar situation as the OP and I found out that there are 3 possible ways to achieve this:
To have a single turn-off-everything button, but I'm not sure if this exists.
Overkill, go through the services and check them one by one and shutdown/delete any instances or running services.
To find out the actual source of leaking (cost occurring services) by viewing what is posting charges on your account and then turn off these services one by one. This can be done by visiting:
your AWS account >> My Billing Dashboard
Find your account username and open the drop down menu:
You can check what services are incurring fees.
Percentage table:
I followed the services by searching for their name on AWS console, if I couldn't find it I'd Google how to do so and then turned them off one by one.
In my case, there was no charge towards my bank even thought billing showed I have some balance, I think it's because I was using the free tier, maybe?
I just hit my free tier limit. I terminated my ec2 instance, deleted my storage volume and even removed my security group and key pair so I have nothing now. Hopefully no charge :P
Always make sure you select the right region. I once had 2 instances running and didnt realize it.
Today I finally discovered a global view to detect all the active services, you still have to disable every service manually but at least you don't have to switch all the regions to understand where you have active services.