Can anyone explain me what's happening with my AWS Load Balancer?
I see the metric Surge Queue Length with two lines growing "together" in a cumulative way:
From the documentation, it says that this is the queue of requests on the Load Balancer to be processed by the backend (EC2 instance) and all the trouble-shooting advices that I found points to a performance issue on the backend, but in my case the instance is healthy (CPU, memory, disk i/o, etc.. everything is fine).
This Load Balancer belongs to a Elastic Beanstalk worker environment with only one instance. And every time I deploy a new version, it seems that the Surge Queue Length is purged.
Can anyone explain me why this cumulative queue is growing even if my backend instance is fine? And why this is purged when I deploy?
Even if the EC2 instance in the back end is healthy looking (CPU, memory, disk, etc...), it could be behind in processing the requests sent by the ELB. This can happen if (like in my case) the EC2 is running under an Elastic Beanstalk environment with Docker, where the EC2 instance can only run one Docker container. In this case, the Docker container running the app can't process all of the incoming requests, but since it's inside an isolated environment (the container), it can't use all of the available resources within the EC2 instance.
In my case, I had to scale up my EC2 instances inside the Autoscaling Group (sitting behind the ELB) even when my EC2 instances report that they are using 5% CPU. After scaling up (CPU Utilization went down to 1%), my performance issues went away.
Hope this helps
Related
So basically I have this EC2 Linux Instance with 8GB memory and I am having multiple applications running on it. It doesn't happen often, but when there is huge traffic, 8GB memory isn't enough, and the whole instance stops responding. To make things work like normal, I have to reboot the instance every time.
I have heard that Elastic Load Balancing might scale the memory as required. But how to achieve that? Is there any other way to solve my problem? Is there a tutorial that'd guide me through this?
I have heard that Elastic Load Balancing might scale the memory
ELB does not scale a memory of your instance. Instead it can distribute your connections among multiple instances. Thus, instead of having one instance which servers all your traffic, you would have two or more. In this setup, ELB would distribute traffic equally among your instances, so that collectively they can server more connections then a single instance.
What's more you usually use ELB with autoscaling so that the number of instances is adjusted automatically up and down, depending on the incoming traffic.
Is there a tutorial that'd guide me through this?
You can start with offical AWS tutorial Set up a scaled and load-balanced application .
With an ELB setup, there as healthcheck timeout, e.g. take a server out of the LB if it fails X fail checks.
For a real zero down time deployment, I actually want to be able to avoid these extra 4-5 seconds of down time.
Is there a simple way to do that on the ops side, or does this needs to be in the level of the web server itself?
If you're doing continuous deployment you should deregister the instance you're deploying to from ELB (say, aws elb deregister-instances-from-load-balancer), wait for the current connections to drain, deploy you app and then register an instance with ELB.
http://docs.aws.amazon.com/cli/latest/reference/elb/deregister-instances-from-load-balancer.html
http://docs.aws.amazon.com/cli/latest/reference/elb/register-instances-with-load-balancer.html
It is also a common strategy to deploy to another AutoScaling Group, then just switch ASG on the load balancer.
Sorry, I had a few basic questions. I'm planning to use an AWS EC2 instance.
1) Is an EC2 instance a single virtual machine image or is it a
single physical machine? Documentation from Amazon states that it is
a "virtual server", but I wanted to clarify if it is an image
embedded inside one server or if it is an single physical server
itself.
2) Is an Elastic Load Balancer a single EC2 instance that handles
all requests from all users and simply forwards the request to the
least loaded EC2 instances?
3) When Auto-Scaling is enabled for an EC2 instance, does it simply
exactly replicate the original EC2 instance when it needs to scale
up?
An EC2 instance is a VM that gets some percentage of the underlying physical host's RAM, CPU, disk, and network i/o. That percentage could theoretically be 100% for certain instance types, including bare-metal instances, but is typically some fraction depending on which instance type you choose.
ELB is a service, not a single EC2 instance. It will scale on your behalf. It routes by round robin for TCP, and routes on fewest outstanding requests for HTTP and HTTPS.
Auto Scaling is "scale out" (it adds new EC2 instances), not "scale up" (resizing an existing EC2 instance). It launches a new instance from a template called an AMI.
It is a virtual server, a VM, as stated in the documentation.
It's a little more complicated that that, based on the way AWS might scale the load balancer, or create a version in each availability zone, etc. It also provides more features such as auto-scaling integration, health checks, SSL termination. I suggest you read the documentation.
It uses a machine image that you specify when you create the auto-scaling group (when you create the Launch Configuration used by the Auto-scaling group to be more precise). A common practice is to configure a machine image that will download any updates and launch the latest version of your application on startup.
You might also be interested in Elastic Beanstalk which is a PaaS that manages much of the AWS infrastructure for you. There are also third-party PaaS offerings such as OpenShift and Heroku that also manage AWS resources for you.
I am new to using web services but we have built a simple web service hosted in IIS on an Amazon EC2 instance with an Amazon RDS hosted database server, this all works fine as a prototype for our mobile application.
The next stage s to look at scale and I need to know how we can have a cluster of instances handling the web service calls as we expect to have a high number of calls to the web service and need to scale the number of instances handling the calls.
I am pretty new to this so at the moment I see we use an IP address in the call to the web service which implies its directed at a specific server> how do we build an architecture on Amazon where the request from the mobile device can be handled by one of a number of servers and in which we can scale the capacity to handle more web service calls by just adding more servers on Amazon
Thanks for any help
Steve
You'll want to use load balancing, that conveniently AWS also offers:
http://aws.amazon.com/elasticloadbalancing/
Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. Customers can enable Elastic Load Balancing within a single Availability Zone or across multiple zones for even more consistent application performance.
In addition to Elastic Load Balancing, you'll want to have an Amazon Machine Image created, so you can launch instances on-demand without having to do manual configuration on each instance you launch. The EC2 documentation describes that process.
There's also Auto Scaling, which lets you set specific metrics to watch and automatically provision more instances. I believe it's throttled, so you don't have to worry about creating way too many, assuming you set reasonable thresholds at which to start and stop launching more instances.
Last (for a simple overview), you'll want to consider being in multiple availability zones so you're resilient to any potential outages. They aren't frequent, but they do happen. There's no guarantee you'll be available if you're only in one AZ.
Is it possible to have most of our server hardware outside of EC2, but with some kind of load balancer to divert traffic to EC2 when there's load that our servers can't handle, or as a backup incase these servers go down?
For example, have a physical server serving out our service (let's ignore database consistency for the moment), but there's a huge spike due to some coolness - can we spin up some EC2 instances and divert traffic off to it? This is much like Amazon's own auto scaling.
And also, if our server hardware dies for some reason (gremlins eat the power cables for example) - can we route all our traffic over to EC2 instances?
Thanks
Yes you can but you will have to code. AWS has Command Line Tools for doing EC2/Autoscaling/S3 stuff with simple commands in bash or other interfaces and SDKs, like Boto for Python etc.
You can find it here: http://aws.amazon.com/code/
Each Ec2 instance has a public network interface associated with it. Use a DNS CNAME record to "switch" your site traffic to the Ec2 instance. If you need to load-balance across multiple machines you can use round-robin DNS, or start a ELB and put any number of Ec2 instances behind it.
Ec2 infrastructure is extremely easy to scale. Deploying your application on top of an Ec2 is a whole other matter. It could be trivial -- or insanely complicated.