Do AWS instances raise events that can directed onto SNS/SQS? - amazon-web-services

We have several workflows in xaml that detach instances from load balancers, backup databases etc. currently this involves polling via a rest api to check the status of an instance or RDS or example. We would prefer to subscribe to a SNS topic and get notified via a message when the state of an instance changes. Any guidance on how to setup something like this up appreciated
Cheers!

If you're using autoscaling, the new AutoScaling Lifecycle Management events sound like exactly what you want:
http://aws.amazon.com/blogs/aws/auto-scaling-update-lifecycle-standby-detach/
You can configure SNS or SQS notifications when servers are added or removed from your autoscaling groups.

Related

SQS Health check

Since SQS is one of the AWS managed service, I do not have to worry about the availability.
But how about monitoring the health of the SQS? Any way we could gain some visibility over the SQS service to be a error free component? We have quite many lambdas connected to it and I want to monitor to ensure there is no error/latency/timeout etc..
From What is AWS Health? - AWS Health:
AWS Health provides ongoing visibility into your resource performance and the availability of your AWS services and accounts. You can use AWS Health events to learn how service and resource changes might affect your applications running on AWS. AWS Health provides relevant and timely information to help you manage events in progress. AWS Health also helps you be aware of and to prepare for planned activities. The service delivers alerts and notifications triggered by changes in the health of AWS resources, so that you get near-instant event visibility and guidance to help accelerate troubleshooting.

Send AWS EC2 metrics to AWS Elasticsearch Service Domain for monitoring in Kibana

I am stuck on one point I have created one EC2 Linux based instance in Aws.
Now I want to send the EC2 metrics data to the managed Elasticsearch domain for monitoring purposes in Kiban, I go through the cloud watch console and check the metric is present of instance but didn't get how to connect with the Elasticsearch domain that I have created.
Can anyone please help me with this situation?
There is no build in mechanism for extraction/streaming of metrics data points in real time. You have to develop a custom solution for that. For example, by having a lambda function which is invoked every minute and which reads data points using get_metric_data. The the lambda would inject the points into your ES.
To invoke a lambda function periodically, e.g. every 1 minute you would have to setup CloudWatch Event rule with schedule Expressions. Lambda function would also need to have permissions granted to interact with CloudWatch metrics.
Welcome to SO :)
An alternative to the solution suggested by Marcin is to install metricbeat on the EC2 Instance and configure the metricbeat config file to send metrics to your Managed AWS ES Domain.
This is pretty simple and you should be able to do this fairly quickly.

How to design a server monitoring system running on AWS

I am building some form of a monitoring agent application that is running on AWS EC2 machines.
I need to be able to send commands to the agent running on a specific EC2 instance and only an agent running on that instance should pick it up and act on it. New EC2 instances can come and go at any point in time.
I can use kinesis and push all commands for all instances there and agents can pick up the ones targeted for them. The problem with this is that agents will have to receive a lot of commands that are not for them and filter it out.
I can also use SQS per instance, but then this will require to create/delete SQS every time new instance is being provisioned.
Would like to hear if there are already proven solutions for a similar scenario.
There already is a fully functional feature provided by AWS. I would rather use that one as opposed to reinventing the wheel, as it is a robust, well-integrated, and proven solution that’s being leveraged by thousands of AWS customers to gain operational insights into their instance fleets:
AWS Systems Manager Agent (SSM Agent) is a piece of software that can be installed and configured on an EC2 instance (and it’s pre-installed on many of the default AMIs, including both versions of Amazon Linux, Ubuntu, and various versions of Windows Server). SSM Agent makes it possible to update, manage, and configure these resources. The agent processes requests from the Systems Manager service in the AWS Cloud, and then runs them as specified in the request. SSM Agent then sends status and execution information back to the Systems Manager service by using the Amazon Message Delivery Service.
You can learn more about AWS Systems Manager and the breadth and depth of functionality it provides here.
Have you considered using Simple Notifications Service? Each new EC2 instance could subscribe to a topic using e.g. http, and remove previous subscribers.
That way the topic would stay constant regardless of EC2 rotation.
It might be worth noting that SNS supports subscription filters, so it can decide which messages deliver to which endpoint.
To my observation, AWS SWF could be the option here. Since Amazon SWF is to coordinate work across distributed application components and it provides SDKs for various platforms. Refer to the official FAQs for more in-depth understanding. https://aws.amazon.com/swf/faqs/
Not entirely clear what the volume of the monitoring system messages will be.
But the architecture requirements described sounds to me as follows:
The agents on the EC2 instances are (constantly?) polling some centralized service, which is a poll based architecture
The messages being sent are to a specific predetermined EC2 instance, which is a push based architecture.
To support both options without significant filtering of the messages I suggest you try using an intermediate PubSub system such Kafka, which can be managed on AWS by MSK.
Then to differentiate between the instances, create a Kafka topic named by the EC2 instance ID.
This should give you a unique topic that the instance will easily know to access messages for itself on a topic denoted by it's own instance ID.
You can also send/push Producer messages to a specific EC2 instance by sending messages to the topic in the cluster named by it's EC2 instance ID.
Since there are many EC2 instances coming and going you will end up with many topics. To handle the volume of topics, you can trigger and notify CloudWatch on each EC2 termination event and check CloudWatch to see which EC2 instances were terminated and consequently their topic needs deleting.
Alternatively, you can trigger a Lambda directly on the EC2 termination event event and log it by creating a file denoted by the instance ID to an S3 Bucket, which you can watch using an additional Lambda that will delete old EC2 instance topics from the Kafka cluster when their instance ID's appear there.

Audit k8s cluster events on AWS

For an EKS cluster, cloudtrail logs cluster events such as create, update and delete. However we are using kubeadm to provision clusters. How do we log an audit trail of these cluster events? Thanks.
CloudTrail logs API events in AWS, so I don't think you can use it for K8S events. However, you can use log shippers to send custom metrics to CloudWatch. From there you can emit events and create dashboards.
For this you have a couple of options, you can use the CloudWatch agent, An Elastic Beat, Logstash, or maybe use something like Splunk if you don't want to use CloudWatch.
From the K8S documentation, there's an Audit log (possibly at /var/log/kube-audit for your cluster) which...
Kubernetes auditing provides a security-relevant chronological set of records documenting the sequence of activities that have affected system by individual users, administrators or other components of the system. It allows cluster administrator to answer the following questions:
You can ship/parse this log with another service.
If you need more control over the outcome, you can write a custom Beat, based on the libbeat specification. https://github.com/elastic/beats/tree/master/libbeat
Otherwise, I think a lot of people use Filebeat: https://github.com/elastic/beats/tree/master/deploy/kubernetes
K8S also supports custom Audit Policies for further control

If you are using AWS to autoscale spot instances of your application, how do you handle logging?

Looking into adding autoscaling of a portion of our application using AWS simple message queuing which would launch EC2 on-demand or spot instances based on queue backlog.
One question I had, is how do you deal with collecting logs from autoscaled instances? New instances are spun up based on an image, but then they are shut down when complete. Currently, if there is an issue with one of our services, which causes it to crash, we have a system to automatically restart the service, but the logs and core dump files are there to review. If we switch to an autoscaling system, where new instance are spun up, how do you get logs and core dump files when there is a failure? Particularly if the instance is spun down.
Good practice is to ship these logs and aggregate them somewhere else, and there are many services such as DataDog and Rapid7 which will do this for you at a cost.
AWS however provides CloudWatch logs, which gives you a central place to store and view logs. It also allows you then to give users access to logs on the AWS console without them having to ssh onto a server.
Shipping your logs to CloudWatch logs requires the installation of the CloudWatch agent on your server and specifying in the config which logs to ship.
You could install the CloudWatch agent once and create an AMI of that server to use in your autoscaling group, or install and configure the CloudWatch agent in userdata for every time a server is spun up.
All the information you need to get started can be found here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html