Is it possible for AWS elastic load balancer to forward the incoming request to each of the ec2 instances behind it ?
You can accomplish it in several ways, and the answers can be very long, but my first recommendation would be to bring up another EC2 instance running, for example, Apache Zookeeper. Every other node (the ones you need to "notify") would then run a Zookeeper client, kind of subscribing for an event of "log changed". Whenever you need to change the log level, you would (manually or automatically) trigger a "log changed" event in your Zookeeper node. There is a lot of examples, use cases and code samples in the Zookeper project page that might help you get started.
The reason why I recommended Zookeeper is because it could serve as a central configuration point (not only log level) for your nodes in the future.
For "Command and Control" types of events, you're likely going to want a different mechanism.
You could take the "An SQS Queue for each server" approach, and whichever server gets the web request pushes it to each server's queue. Severs are periodically polling their queue for C&C operations. This gives you guaranteed delivery semantics which are quite important for C&C operations.
Instead of SQS, a database could be used to accomplish (mostly) the same thing. The DB approach is nice as it could also give you audit history which may (or may not) be important.
Related
I need to notify all machines behind a load balancer when something happens.
For example, I have machines behind a load balancer which cache data, and if the data changes I want to notify the machines so they can dump their caches.
I feel as if I'm missing something as it seems I might be overcomplicating how I talk to all the machines behind my load balancer.
--
Options I've considered
SNS
The problem with this is such that each individual machine would need to be publicly accessible over HTTPS.
SNS Straight to Machines
Machines would subscribe themselves with their EC2 URL with SNS on startup. To achieve this I'd need to either
open those machines up to http from anywhere (not just the load balancer)
create a security group which lets SNS IP ranges into the machines over HTTPS.
This security group could be static (IPs don't appear to have changed since ~2014 from what i can gather)
I could create a scheduled lambda which updates this security group from the json file provided by AWS if I wanted to ensure this list was always up to date.
SNS via LB with fanout
The load balancer URL would be subscribed to SNS. When a notification is received one of the machines would receive it.
The machine would use the AWS API to look at the autoscaling group it belongs to to find other machines attached to the same load balancer and then send the other machines the same message using its internal URL.
SQS with fanout
Each machine would be a queue worker, one would receive the message and forward on to the other machines in the same way as the SNS fanout described above.
Redis PubSub
I could set up a Redis cluster which each node subscribes to and receives the updates. This seems a costly option given the task at hand (especially given I'm operating in many regions and AZs).
Websocket MQTT Topics
Each node would subscribe to an MQTT topic and received the update this way. Not every region I use supports IOT Core yet so I'd need to either host my own broker in each region or have every region connect to their nearest supported (or even a single) region. Not sure about the stability of this but seems like it might be a good option perhaps.
I suppose a 3rd party websocket service like Pusher or something could be used for this purpose.
Polling for updates
Each node contains x cached items, I would have to poll for each item individually or build some means by which to determine which items have changed into a bulk request.
This seems excessive though - hypothetically 50 items, at polling intervals of 10 seconds
6 requests per item per minute
6 * 50 * 60 * 24 = 432000 requests per day to some web service/lambda etc. Just seems a bad option for this use case when most of those requests will say nothing has changed. A push/subscription model seems better than a pull/get model.
I could also use long polling perhaps?
Dynamodb streams
The change which would cause a cache clear is made in a global DynamoDB table (not owned by or known by this service) so I could perhaps allow access to read the stream from that table in every region and listen for changes via that route. That couples the two services pretty tightly though which I'm not keen on.
I have to setup jboss over AWS-EC2-Windows server, this will scale-up as well as per the requirements. We are using ELK for infrastructure monitoring for which will be installing beats here which will send the data to on-prem logstash. There we on-board the servers with there hostname and ip.
Now the problem is: in case of autoscaling, how we can achieve this.
Please advise.
Thanks,
Abhishek
If you would create one EC2 instance and create an AMI of it in order to have it autoscale based on that one, this way the config can be part of it.
If you mean by onboard adding it to the allowed list, you could use a direct connect or a VPC with a custom CIDR block defined and add that subnet in the allowed list already.
AFAIK You need to change the logstash config file on disk to add new hosts, and it should notice the updated config automatically and "just work".
I would suggest a local script that can read/write the config file and that polls an SQS queue "listening" for autoscaling events. You can have your ASG send SNS messages when it scales and then subscribe an SQS queue to receive them. Messages will be retained for upto 14 days and theres options to add delays if required. The message you receive from SQS will indicate the region, instance-id and operation (launched or terminated) from which you can lookup the IP address/hostname to add/remove from the config file (and the message should be deleted from the queue when processed successfully). Editing the config file is just simple string operations to locate the right line and insert the new one. This approach only requires outbound HTTPS access for your local script to work and some IAM permissions, but there is (a probably trivial) cost implication.
Another option is a UserData script thats executed on each instance at startup (part of the Launch Template of your AutoScale group). Exactly how it might communicate with your on-prem depends on your architecture/capabilities - anythings possible. You could write a simple webservice to manage the config file and have the instances call it but thats a lot more effort and somewhat risky in my opinion.
FYI - if you use SQS look at Long Polling if your checking the queues frequently/want the message to propagate as quickly as possible (TLDR - faster & cheaper than polling any more than twice a minute). Its good practice to use a dead-letter queue with SQS - messages that get retrieved but not removed from the queue end up here. Setup alarms on the queue and deadletter queue to alert you via email if there are messages failing to be processed or not getting picked up in sensible time (ie your script has crashed etc).
Our .net core web app currently accepts websocket connections and pushes out data to clients on certain events (edit, delete, create of some of our entities).
We would like to load balance this application now but foresee a problem in how we handle the socket connections. Basically, if I understand correctly, only the node that handles a specific event will push data out to its clients and none of the clients connected to the other nodes will get the update.
What is a generally accepted way of handling this problem? The best way I can think of is to also send that same event to all nodes in a cluster so that they can also update their clients. Is this possible? How would I know about the other nodes in the cluster?
The will be hosted in AWS.
You need to distribute the event to all nodes in the cluster, so that they can each push the update out to their websocket clients. A common way to do this on AWS is to use SNS to distribute the event to all nodes. You could also use ElastiCache Redis Pub/Sub for this.
As an alternative to SNS or Redis, you could use a Kinesis Stream. But before going to that link, read about Apache Kafka, because the AWS docs don't do a good job of explaining why you'd use Kinesis for anything other than log ingest.
To summarize: Kinesis is a "persistent transaction log": everything that you write to it is stored for some amount of time (by default a day, but you can pay for up to 7 days) and is readable by any number of consumers.
In your use case, each worker process would start reading at the then-current end-of stream, and continue reading (and distributing events) until shut down.
The main issue that I have with Kinesis is that there's no "long poll" mechanism like there is with SQS. A given read request may or may not return data. What it does tell you is whether you're currently at the end of the stream; if not, you have to keep reading until you are. And, of course, Amazon will throttle you if you read too fast. As a result, your code tends to have sleeps.
I'm very new to using AWS, and even more so for ECS. Currently, I have developed an application that can take an S3 link, download the data from that link, processes the data, and then output some information about that data. I've already packaged this application up in a docker container and now resides on the amazon container registry. What I want to do now is start up a cluster, send an S3 link to each EC2 instance running Docker, have all the container instances crunch the numbers, and return all the results back to a single node. I don't quite understand how I am supposed to change my application at this point. Do I need to make my application running in the docker container a service? Or should I just send commands to containers via ssh? Then assuming I get that far, how do I then communicate with the cluster to farm out the work for potentially hundreds of S3 links? Ideally, since my application is very compute intensive, I'd like to only run one container per EC2 instance.
Thanks!
Your story is hard to answer since it's a lot of questions without a lot of research done.
My initial thought is to make it completely stateless.
You're on the right track by making them start up and process via S3. You should expand this to use something like an SQS queue. Those SQS messages would contain an S3 link. Your application will start up, grab a message from SQS, process the link it got, and delete the message.
The next thing is to not output to a console of any kind. Output somewhere else. Like a different SQS queue, or somewhere.
This removes the requirement for the boxes to talk to each other. This will speed things up, make it infinitely scalable and remove the strange hackery around making them communicate.
Also why one container per instance? 2 threads at 50% is the same as 1 at 100% usually. Remove this requirement and you can use ECS + Lambda + Cloudwatch to scale based on the number of messages. >10000, scale up, that kind of thing. <100 scale down. This means you can throw millions of messages into SQS and just let ECS scale up to process them and output somewhere else to consume.
I agree with Marc Young, you need to make this stateless and decouple the communication layer from the app.
For an application like this I would put the S3 links into a queue (rabbitMQ is a good one, I personally don't care for SQS but it's also an option). Then have your worker nodes in ECS pull messages off the queue and process.
It sounds like you have another app that does processing. Depending on the output you could then put the result into another processing queue and use the same model or just stuff it directly in a database of some sort (or as files in S3).
In addition to what Marc said about autoscaling, consider using cloudwatch + spot instances to manage the cost of your ECS container instances. Particularly for heavy compute tasks, you can get big discounts that way.
I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?
I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."
Amazon's Route 53 Health Check is the right tool for the job.
Route 53 can monitor the health and performance of your application as well as your web servers and other resources.
You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.
http://eladnava.com/monitoring-http-health-email-alerts-aws/
To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.
When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.
For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html
As answered already, use an ELB to monitor HTTP.
This is the list of available metrics for ELB:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics
To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".
CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.
If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.
Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.
My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.
CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.
You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.
You could even do that from the instance itself.
As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.
There are a bunch of ways to get instance health info. Here are a couple.
Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.
Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch
You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :
In the CLoudWatch Console choose Events -> rule
For Event Pattern, In service Name Choose EC2
For Event Type, Choose EC2 Instance State-change Notification
For Specific States, Choose Stopped
In targets Choose any previously created SNS topic for sending a notification!
Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule
This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.