Web Service Large Volume Of Calls On Amazon EC2 - web-services

I am new to using web services but we have built a simple web service hosted in IIS on an Amazon EC2 instance with an Amazon RDS hosted database server, this all works fine as a prototype for our mobile application.
The next stage s to look at scale and I need to know how we can have a cluster of instances handling the web service calls as we expect to have a high number of calls to the web service and need to scale the number of instances handling the calls.
I am pretty new to this so at the moment I see we use an IP address in the call to the web service which implies its directed at a specific server> how do we build an architecture on Amazon where the request from the mobile device can be handled by one of a number of servers and in which we can scale the capacity to handle more web service calls by just adding more servers on Amazon
Thanks for any help
Steve

You'll want to use load balancing, that conveniently AWS also offers:
http://aws.amazon.com/elasticloadbalancing/
Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. Customers can enable Elastic Load Balancing within a single Availability Zone or across multiple zones for even more consistent application performance.

In addition to Elastic Load Balancing, you'll want to have an Amazon Machine Image created, so you can launch instances on-demand without having to do manual configuration on each instance you launch. The EC2 documentation describes that process.
There's also Auto Scaling, which lets you set specific metrics to watch and automatically provision more instances. I believe it's throttled, so you don't have to worry about creating way too many, assuming you set reasonable thresholds at which to start and stop launching more instances.
Last (for a simple overview), you'll want to consider being in multiple availability zones so you're resilient to any potential outages. They aren't frequent, but they do happen. There's no guarantee you'll be available if you're only in one AZ.

Related

AWS ALB in front of one server

I have a server (apache/php) running the front end of saas platform.
This will not receive high traffic and therefore does not need load balancing.
Does it make sense to add load balancer and auto scaling group (with count of 1 server) for security reasons?
It allows the server to be isolated in the VPC + it allow services such as WAF that increase security. The extra cost is not a problem.
It does make sense in the following ways,
It can help you in configuring health checks for your instance. If
you instance fails for some reasons, the load balancer will
instantiate another EC2 instance for you hence minimizing the
downtime of your application
Naturally makes your instance more secure by hiding it in a VPC (as you suggested)
Lastly, it will future-proof your architecture and will enable you
to quickly scale up your infrastructure if need be
As you said you have a single server and do not get much traffic add a load balancer to your server.
You can enable health checks so that by integrating it with SNS you will get notified if a health check fails( server unhealthy)
By adding WAF to your application load balancer you can monitor HTTP/S requests and control access to web applications.
It depends upon your requirement like with WAF you can
Block or allow traffic to your application from a specific region
Block or allow traffic to your application from a specified IP range
You can mention the specific number of requests to your application within 5 minutes if it exceeds you can block or count.

CPU and memory utilization discrepancies for ejabberd and Riak clusters on AWS

I'm running a 2-node ejabberd cluster (behind an elastic load balancer) that in turn connects with a 3-node Riak cluster (again, via an ELB) on AWS. When I load-test the platform via Tsung (creating 0.5 million user registrations), I notice that the CPU utilization for the ejabberd nodes differs amongst themselves by around 10%. For the Riak nodes, the CPU and memory utilization amongst nodes differs by around 5%.
The nodes are of identical configuration, so wondering what could be leading to these non-trivial differences in utilization. Can anyone throw some light here please / educate me?
Is it due to the load balancer? Or a network impact? I expect that once a cluster is formed (either of ejabberd or of Riak KV), the nodes are all identical in behavior, especially for ejabberd where the entire database is replicated across the cluster.
Not that these differences are a problem, but would be good to understand the inner workings of the clusters here...
Many thanks.
Elastic Load Balancing mechanism
DNS server uses DNS round robin to determine which load balancer node in a specific Availablity Zone will receive the request
The selected load balancer checks for "sticky session" cookie
The selected load balancer sends the request to the least loaded instance
And in greater details:
Availability Zones (unlikely your case)
By default, the load balancer node routes traffic to back-end instances within the same Availability Zone. To ensure that your back-end instances are able to handle the request load in each Availability Zone, it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a.
Sessions (most likely your case)
By default a load balancer routes each request independently to the server instance with the smallest load. By comparison, a sticky session binds a user's session to a specific server instance so that all requests coming from the user during the session will be sent to the same server instance.
AWS Elastic Beanstalk uses load balancer-generated HTTP cookies when sticky sessions are enabled for an application. The load balancer uses a special load balancer–generated cookie to track the application instance for each request. When the load balancer receives a request, it first checks to see if this cookie is present in the request. If so, the request is sent to the application instance specified in the cookie. If there is no cookie, the load balancer chooses an application instance based on the existing load balancing algorithm. A cookie is inserted into the response for binding subsequent requests from the same user to that application instance. The policy configuration defines a cookie expiry, which establishes the duration of validity for each cookie.
Routing Algorithm (less likely your case)
Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.
Source: Elastic Load Balancing Terminology And Key Concepts
Hope it helps.

AWS EC2 Instance - Is it a single virtual image or single physical machine?

Sorry, I had a few basic questions. I'm planning to use an AWS EC2 instance.
1) Is an EC2 instance a single virtual machine image or is it a
single physical machine? Documentation from Amazon states that it is
a "virtual server", but I wanted to clarify if it is an image
embedded inside one server or if it is an single physical server
itself.
2) Is an Elastic Load Balancer a single EC2 instance that handles
all requests from all users and simply forwards the request to the
least loaded EC2 instances?
3) When Auto-Scaling is enabled for an EC2 instance, does it simply
exactly replicate the original EC2 instance when it needs to scale
up?
An EC2 instance is a VM that gets some percentage of the underlying physical host's RAM, CPU, disk, and network i/o. That percentage could theoretically be 100% for certain instance types, including bare-metal instances, but is typically some fraction depending on which instance type you choose.
ELB is a service, not a single EC2 instance. It will scale on your behalf. It routes by round robin for TCP, and routes on fewest outstanding requests for HTTP and HTTPS.
Auto Scaling is "scale out" (it adds new EC2 instances), not "scale up" (resizing an existing EC2 instance). It launches a new instance from a template called an AMI.
It is a virtual server, a VM, as stated in the documentation.
It's a little more complicated that that, based on the way AWS might scale the load balancer, or create a version in each availability zone, etc. It also provides more features such as auto-scaling integration, health checks, SSL termination. I suggest you read the documentation.
It uses a machine image that you specify when you create the auto-scaling group (when you create the Launch Configuration used by the Auto-scaling group to be more precise). A common practice is to configure a machine image that will download any updates and launch the latest version of your application on startup.
You might also be interested in Elastic Beanstalk which is a PaaS that manages much of the AWS infrastructure for you. There are also third-party PaaS offerings such as OpenShift and Heroku that also manage AWS resources for you.

AWS ELB doesn't distribute requests to auto scaling group EC2 instances in some cases

I'm trying to do performance testing for my AWS auto scaling group using jmeter.
Firstly, I did scale-in/out testing. I set the threshold to be 70% cpu utilization for 2 periods, each period is 2 minutes. The ELB works fine, and the requests was distributed to all EC2 instances in auto scaling group, in spite of un-equality, after the system scale-out.
In next, I want to test whether the two instances' load can be twice of one instance's.
I fixed the instance number of auto scaling group, I set the min/max/desired instance count to be 2. When I push load from single JMeter, there are always just only one instance works and its cpu utilization reach almost 100 percent, but the cpu utilization of the other instance is still zero.... If I push load from an JMeter cluster which contains several slaves, all instances take load.
Somebody said, maybe the load is not heavy enough, so the ELB considered that just one instance can handle it and didn't dispatch requests to other instance. I don't think so, because I push load from just one slave of this JMeter cluster, however I increase the load, just only one instance handle requests.
I found an blog which said ELB is great in HA but not load balancing.
https://www.stackdriver.com/elb-affinity-problems
But, I don't think the behavior, only one instance handle requests, is normal.
What the hell in the ELB load balance mechanism? I'm confused.
Elastic Load Balancing mechanism
DNS server uses DNS round robin to determine which load balancer node in a specific Availablity Zone will receive the request
The selected load balancer checks for "sticky session" cookie
The selected oad balancer sends the request to the least loaded instance
And in greater details:
Availability Zones (unlikely your case)
By default, the load balancer node routes traffic to back-end instances within the same Availability Zone. To ensure that your back-end instances are able to handle the request load in each Availability Zone, it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a.
Sessions (most likely your case)
By default a load balancer routes each request independently to the server instance with the smallest load. By comparison, a sticky session binds a user's session to a specific server instance so that all requests coming from the user during the session will be sent to the same server instance.
AWS Elastic Beanstalk uses load balancer-generated HTTP cookies when sticky sessions are enabled for an application. The load balancer uses a special load balancer–generated cookie to track the application instance for each request. When the load balancer receives a request, it first checks to see if this cookie is present in the request. If so, the request is sent to the application instance specified in the cookie. If there is no cookie, the load balancer chooses an application instance based on the existing load balancing algorithm. A cookie is inserted into the response for binding subsequent requests from the same user to that application instance. The policy configuration defines a cookie expiry, which establishes the duration of validity for each cookie.
Routing Algorithm (less likely your case)
Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.
Source: Elastic Load Balancing Terminology And Key Concepts
Hope it helps.
I had this issue with unbalanced ELB traffic when back end instances where in different availability zones and the ELB was receiving requests from a small number of clients. In our case we were using an internal ELB within the application tiers. In your case "push load from single JMeter" likely means a small number of clients as seen by the ELB. The solution was to enable cross zone load balancing using the API similar to this fragment:-
elb-modify-lb-attributes ${ELB} --region ${REGION} --crosszoneloadbalancing "enabled=true"
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-disable-crosszone-lb.html

Using EC2 only when under load or incase of failure

Is it possible to have most of our server hardware outside of EC2, but with some kind of load balancer to divert traffic to EC2 when there's load that our servers can't handle, or as a backup incase these servers go down?
For example, have a physical server serving out our service (let's ignore database consistency for the moment), but there's a huge spike due to some coolness - can we spin up some EC2 instances and divert traffic off to it? This is much like Amazon's own auto scaling.
And also, if our server hardware dies for some reason (gremlins eat the power cables for example) - can we route all our traffic over to EC2 instances?
Thanks
Yes you can but you will have to code. AWS has Command Line Tools for doing EC2/Autoscaling/S3 stuff with simple commands in bash or other interfaces and SDKs, like Boto for Python etc.
You can find it here: http://aws.amazon.com/code/
Each Ec2 instance has a public network interface associated with it. Use a DNS CNAME record to "switch" your site traffic to the Ec2 instance. If you need to load-balance across multiple machines you can use round-robin DNS, or start a ELB and put any number of Ec2 instances behind it.
Ec2 infrastructure is extremely easy to scale. Deploying your application on top of an Ec2 is a whole other matter. It could be trivial -- or insanely complicated.