Elastic Beanstalk Auto Scaling - Which metric should I use?

Elastic Beanstalk Auto Scaling - Which metric should I use? - amazon-web-services

I've got an application that is built in node.js, and is primarily used to post photos to (up to 25mb). The app resizes to thumbnail size, and moves both the thumbnail and full size image to S3. When the uploads begin happening, they usually come in bursts of 10-15 pictures, rinse, wash, repeat in 5 minute durations. I'm seeing a lot of scaling, and the trigger is the default 6MB NetworkOut trigger. My question is, is the moving the photos to S3 considered NetworkOut? Or should I consider a different scaling trigger, so far the app hasn't stuttered so I'm hesitant to not fix what ain't broken, but I am seeing quite a big of scaling so I thought I would investigate. Thanks for any help!

The short answer - scale when ever a resource is constrained. eg, If your instances can keep up with network IO or cpu is above 80% then scale. Yes, sending any data from your ec2 instance is network out traffic. You got to get that data from point A to B somehow :)
As you go up in size on ec2 instances you get more memory and cpu along with more network IO. If you don't see issue with transfers you may want to switch the auto scale over to watch cpu or memory. In an app I'm working on users can start jobs which require a bit of cpu. So I have my auto-scale to scale if my cpu is over 80%. But you might have a process that consumes a lot of memory and not much cpu...
On a side note - you may want to think about having your uploads go directly to your s3 bucket and use a lambda to trigger the resize routine. This has several advantages over your current design. http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html

I suggest getting familiar with the instance metrics. You can then recognize your app-specific bottlenecks on the current instance type and count.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-metrics.html

Related

How do I handle instant spikes in traffic in my APIs

I am using aws for my cloud infrastructure. I use ecs fargate as my compute machine. I am currently maintaining 10-20 apis which interact with members who have my application downloaded on their phone. Obviously one or two of these apis are my "main" apis and these are the ones which are really personalised to my users and honestly, these are the only two apis which members really access (by navigating to those screens).
My business team wants to send push notifications to members to alert them on certain new events which lands them on a screen where these APIs need to be called. Due to this, my application has mini crashes during this time period.
I've thought of a couple of ideas for the same, but since this is obviously an issue across industries and a solved problem, I wanted the standard solutions.
The ideas I have:
Sending notifications in batches. This seems like the best solution though it requires a bit of effort though I'm not sure how much.
Have a serverless machine run my requests (aws lambda functions) for those APIs which need to scale instantly. I have a lot of other APIs which I keep in fargate because I don't want my lambda function to be too heavy and then take a while to start up.
Scale machines all the time to handle the load I get during push notifications. This seems suboptimal due to cost reasons.
Scale machines up just during those periods where I want to send push notifications and them scale them back down. This seems like a decent solution if I can automate the entire process. I can have a flow which I follow for each push notification which will cause the system to scale and then start sending the notifications.
Is there a better way to do this. This seems like a relatively straightforward problem for people to have, but I don't see too much information on this topic.

I like your second option best because it's by far the easiest to manage (because you don't have to manage it). After that I'd go with your last option. I would use step functions to manage this, where the first step is to scale up the number of instances in Fargate. Once that has reached the desired level you would send the notifications. Add autoscaling to your services in Fargate to have it handle coming down automatically.

GCP Autoscale Down Protection

So I have a set of long running tasks that have to be run on Compute Engine and have to scale. Each task takes approximately 3 hours. So in order to handle this I thought about using:
https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks
Architecture. And while it works fine there is one huge problem. On scale down, I'd really like to avoid it scaling down a task that is currently running! I'd potentially lose 3 hours worth of processing.
Is there a way to ensure that autoscale down doesn't scale down a VM with a long running / uptime?
EDIT: A few people have asked to elaborate my task. So it's similar to what's described in the link above which is many long running tasks that need to be run on a GPU. There is a chunk of data that needs to be processed. It takes 4 hours (video encoding) then once completed it outputs to a bucket. Well it can take anywhere from 1 to 6 hours depending on the length of the video. Just like the architecture above it would be nice to have the cluster scale up based on queue size. But when scaling down I'd like to ensure that it's not scaling down currently running tasks which is what is currently happening. It being GPU bound doesn't allow me to use the CPU metric.

I think you should probably add more details about what kind of task you are running. However, as #Jhon Hanley suggestion, it worth to take a look of Cloud Tasks and see as well the following documentation that talks about the scaling risks.

Estimate AWS cost

The company which I work right now planning to use AWS to host a new website for a client. Their old website had roughly 75,000 sessions and 250,000 page views per year. We haven't used AWS before and I need to give a rough cost estimate to my project manager.
This new website is going to be mostly content-driven with a cms backend (probably WordPress) + a cost calculator for their services. Can anyone give me a rough idea about the cost to host such kind of a website in aws?
I have used simple monthly calculator with a single Linux t2.small 3 Year upfront which gave me around 470$.
(forgive my English)

The only way to know the cost is to know the actual services you will consume (Amazon EC2, Amazon EBS, database, etc). It is not possible to give an accurate "guess" of these requirements because it really does depend upon the application and usage patterns.
It is normally recommended that you implement the system and run it for a while before committing to Reserved Instances so that you have a chance to measure performance and test a few different instance types.
Be careful using T2 instances for production workloads. They are very powerful instances, but if the CPU Credits run out, the amount of CPU is limited.
Bottom line: Implement, measure, test. Then you'll know what is right for your needs.

Take Note
When you are new in AWS you have a 1 year free tier on a single t2.micro

Just pulled it out, looking into your requirement you may not need this
One load balancer and App server should be fine (Just use route53 to serve some static pages from s3 while upgrading or scalling )
Use of email subscription and processing of Some document can be handled with AWS Lambda, SNS and SWQ which may further reduce the cost ( you may reduce the server size and do all the hevay lifting from Lambda)
A simple webpage with 3000 request/monthly can be handled by T2 micro which is almost free for one year as mentioned above in the note

You don't have a lot of details in your question. AWS has a wide variety of services that you could be using in that scenario. To accurately estimate costs, you should gather these details:
What will the AWS storage be used for? A database, applications, file storage?
How big will the objects be? Each type of storage has different limits on individual file size, estimate your largest object size.
How long will you store these objects? This will help you determine static, persistent or container storage.
What is the total size of the storage you need? Again, different products have different limits.
How often do you need to do backup snapshots? Where will you store them?
Every cloud vendor has a detailed calculator to help you determine costs. However, to use them effectively you need to have all of these questions answered and you need to understand what each product is used for. If you would like to get a quick estimate of costs, you can use this calculator by NetApp.

Can I improve performance of my GCE small instance?

I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.

Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.

My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team

Data Intensive process in EC2 - any tips?

We are trying to run an ETL process in an High I/O Instance on Amazon EC2. The same process locally on a very well equipped laptop (with a SSD) take about 1/6th the time. This process is basically transforming data (30 million rows or so) from flat tables to a 3rd normal form schema in the same Oracle instance.
Any ideas on what might be slowing us down?

Or another option is to simply move off of AWS and rent beefy boxes (raw hardware) with SSDs in something like Rackspace.
We have moved most of our ETL processes off of AWS/EMR. We host most of it on Rackspace and getting a lot more CPU/Storage/Performance for the money. Don't get me wrong AWS is awesome but there comes a point where it's not cost effective. On top of that you never know how they are really managing/virtualizing the hardware that applies to your specific application.
My two cents.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js