I'm running a microservice on AWS Elastic Beanstalk which is logging it's responses internally at 1-4ms, but the AWS Dashboard is showing an average of 68ms (not even counting latency to/from AWS). Is this normal? It just seems odd that EB/ELB would add 60ms of latency to every request.
It's configured to use a Docker container, which seems to use nginx. Although it doesn't seem to be configured to log the ttfb in the access logs, this is auto-configured by Amazon.
In testing I tried both a t2.micro, and a t2.large instance, and that had no effect on the test results. Is there something I can tweak on my end... really need to get this under 10-20ms (not counting rtt/ping distance) for the service to be useful.
It appears to have been a problem on Amazon's side... It was averaging 69ms on Friday, today (Monday morning) it's now 3.9ms
Related
Hi I am new to Fargate and confused about its calculation.
How is the 'Average duration' calculated and charged ? is it calculated and charged only for the time between request arriving and return of response or pods are continually running and are charged for 24*7*365 ?
Also does fargate fetches image from ECR every time a request arrives ?
Do fargate costs even when there is no request and nothing is processing ?
What is the correct way of calculating Average duration section ?
This can make huge difference in cost.
You can learn more details from AWS Fargate Pricing and from AWS Pricing Calculator. When you read details from first link that I mentioned, you will find the explanation for duration and there are 3 example in the link.
How is the 'Average duration' calculated and charged ? is it calculated and charged only for the time between request arriving and return of response or pods are continually running and are charged for 247365 ?
Fargate is not a request-based service. Fargate runs your pod for the entire time you ask it to run that pod. It doesn't deploy pods when a request comes in, the pods are running 24/7 (or as long as you have it configured to run).
Fargate is "serverless" in the sense that you don't have to manage the EC2 server the container(s) are running on yourself, Amazon manages the EC2 server for you.
Also does fargate fetches image from ECR every time a request arrives ?
Fargate pulls from ECR when a pod is deployed. It has to be deployed and running already in order to accept requests. It does not deploy a pod when a request comes in like you are suggesting.
Do fargate costs even when there is no request and nothing is processing ?
Fargate charges for the amount of RAM and CPU you have allocated to your pod, regardless of if they are actively processing requests or not. Fargate does not care about the number of requests. You could even use Fargate for doing things like back-end processing services that don't accept requests at all.
If you want an AWS service that only runs (and charges) when a request comes in, then you would have to use AWS Lambda.
You could also look at AWS App Runner, which is in kind of a middle ground between Lambda and Fargate. It works like Fargate, but it suspends your containers when requests aren't coming in, in order to save some money on the CPU charges.
Running AWS "Managed Nodes" for an EKS Cluster across 2 AZ's.
3 Nodes in total. I get random timeouts when attempting to pull the containers down.
This has been so hard to trace because it does work (sometimes), so it's not like an ACL is blocking or a security group.
When I ssh into the nodes, sometimes I can pull down the image manually and sometimes I cannot. When I've run curl requests curl -I https://hub.docker.com it takes sometimes 2 minutes to get a response back. I'm guessing this is why the images are timing out.
I don't know of a way to increase the timeout for k8s to pull the image, but also can't figure out why the latency is so bad in doing the curl request.
Any suggestions are greatly appreciated.
FYI, worker nodes in Private Subnet, proper routes to NAT Gateway in place. VPC Flow logs are good.
Random is the hardest thing to trace 🤷.
🥼 You could move your images to a private ECR registry or simply run a registry in your cluster to discard that it's an issue with your Kubernetes networking. Running AWS CNI❓
It could also just be rate-limiting from docker hub itself. Are you using the same external NAT IP address to pull from multiple nodes/clusters❓:
Docker will gradually impose download rate limits with an eventual limit of 300 downloads per six hours for anonymous users.
Logged in users will not be affected at this time. Therefore, we recommend that you log into Docker Hub as an authenticated user. For more information, see the following section How do I authenticate pull requests.
✌️
I have a web service running on several EC2 boxes. Based on the Cloudwatch latency metric, I'd like to scale up additional boxes. But, given that it takes several minutes to spin up an EC2 from an AMI (with startup code to download the latest application JAR and apply OS patches), is there a way to have a "cold" server that could instantly be turned on/off?
Not by using AutoScaling. At least not, instant in the way you describe. You could make it much faster however, by making your own modified AMI image where you place the JAR and the latest OS patches. These AMI's can be generated as part of your build pipeline. In that case, your only real wait time is for the OS and services to start, similar to a "cold" server.
Packer is a tool commonly used for such use cases.
Alternatively, you can mange it yourself, by having servers switched off, and start them by writing some custom Lambda scripts that gets triggered by Cloudwatch alerts. But since stopped servers aren't exactly free either, i would recommend against that for cost reasons.
Before you venture into the journey of auto scaling your infrastructure and spending time/effort. Perhaps you should do a little bit of analysis on the traffic pattern day over day, week over week and month over month and see if it's even necessary? Try answering some of these questions.
What was the highest traffic ever your app handled, How did the servers fare given the traffic? How was the user response time?
When does your traffic ramp up or hit peak? Some apps get traffic during business hours while others in the evening.
What is your current throughput? For example, you can handle 1k requests/min and two EC2 hosts are averaging 20% CPU. if the requests triple to 3k requests/min are you able to see around 60% - 70% avg cpu? this is a good indication that your app usage is fairly predictable can scale linearly by adding more hosts. But if you've never seen traffic burst like that no point over provisioning.
Unless you have a Zynga like application where you can see large number traffic at once perhaps better understanding your traffic pattern and throwing in an additional host as insurance could be helpful. I'm making these assumptions as I don't know the nature of your business.
If you do want to auto scale anyway, one solution would be to containerize your application with Docker or create your own AMI like others have suggested. Still it will take few minutes to boot them up. Next option is the keep hosts on standby but and add those to your load balancers using scripts ( or lambda functions) that watches metrics you define (I'm assuming your app is running behind load balancers).
Good luck.
I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl
If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html
If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type
What is the best way to deal with traffic spikes on elastic beanstalk? In my experience this does not seem to scale quickly enough i.e. the new instances take a few minutes to get going.
Should I be doing some more calculations to optimise the scaling process?
Is there a formula for working these thing out?
Yeah, it takes 5-10 minutes (depending on the stack you're using; not counting Windows instances) to launch a new Beanstalk instance via CloudFormation, install and configure the environment software, add the instance to the load balanced cluster, deploy your application code, and run any of your .ebextensions. (All of which you can follow along with by watching the event log for the environment, or the log for the CloudFormation template executing in the background.)
There are a couple of ways to handle this:
Use larger instances that won't need to scale-out so quickly.
Tweak your Auto Scaling triggers for your environment (via the AWS Console, web service API, or CLI tools) so that scale-up happens sooner. That way you'll have the extra capacity by the time the existing servers get maxed-out.