what does deploy command does in aws sagemaker? - amazon-web-services

I am curious to know what does the model.deploy command actually does in the background when implemented in aws sagemaker notebook
for eg :
predictor = sagemaker_model.deploy(initial_instance_count=9,instance_type='ml.c5.xlarge')
and also at the time of sagemaker endpoint autoscaling what is happening in the background, it is taking to long almost 10 minutes to launch a new-instances, by which most of the requests get dropped or not processed and also getting connection timeout while load testing threw JMeter. Is there any way to fast bootup or golden AMI kind of thing in sagemaker?
are there any other means by which this issue can be solved?

The docs mention what the deploy method does: https://sagemaker.readthedocs.io/en/stable/model.html#sagemaker.model.Model.deploy
You could also take a look at the source code here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L377
Essentially the deploy method hosts your model on a SageMaker Endpoint, launching the number of instances using the instance type that you specify. You can then invoke your model using a predictor: https://sagemaker.readthedocs.io/en/stable/predictors.html
For autoscaling, you may want to consider lowering your threshold for scaling out so that the additional instances start to be launched earlier. This page offers some good advice on how to determine the RPS your endpoint can handle. Specifically, you may want to have a lower SAFETY_FACTOR to ensure new instances are provisioned in time to handle your expected traffic.

Related

AWS ECS app fast auto scaling for video encoding. What is the best way?

I am currently running a video encoding application on ECS but auto scaling is my biggest problem.
Users start live video encoding jobs from a front end. Once a job is placed, this is added as a redis queue (rq) job that runs on an ECS task placed on a c5d.large instance using ffmpeg.
Autoscaling is currently based on alarms. If cpu is > than a set percentage, a new instance and task is spawned. If cpu is low, instances are checked and if no jobs are running they are destroyed.
This is not a bad solution but it feels clunky and slow. If a user wants to start two jobs one right after the other, it takes a couple of minutes for the instance to spawn + task to be placed (even using warm groups).
Plus cloudwatch alarms take a while to refresh and are not a super reliable way of defining work that is being done (a video encoding at 720p will use less cpu than one at 1080p and thus mess all my alarm settings).
Is there a better solution that someone can guide me to that allows for fast and precise autoscaling other than relying on cloudwatch alarms? I am tempted to try to create my own autoscaling system based on current executing jobs / workers and spawn/destroy instances directly calling the API from my code, but I'm hoping to find a better solution directly from within AWS.
Thanks
I too have this exact problem, AWS already has mediaconvert/elastictranscoder but it's just too expensive & I decided to create my own firstly on lambda with SST.dev (serverless) where all jobs are a single function invocation but I had issues with 15mins function timeout mostly because I'm not copying codecs.
scaling at this point I would think is Kubernetes. This is the sort of problem that Kubernetes is intended to handle (dynamic resource scaling on demand). Kubernetes is rather non-trivial. K8s is what the industry has settled on for the most part, so there are probably a lot of reasons to just go that route. You could start with K3S (psst! i just knew that today) and move up to K8s when you are ready.
Since you're trying to find a solution directly from within AWS, you can try EKS but I'm not completely sure what the best would be.

AWS Container (ECS) vs AMI & Spot instances

The core of my question is whether or not there are downsides to using an Amazon Machine Image + Micro Spot instances to run a task, vs using the Elastic Container Service (ECS).
Here's my situation: I have the need to run a task on demand that is triggered by a remote web hook.
There is the possibility this task can get triggered 10 times in a row, or go weeks w/o ever executing, so I definitely want a service that only runs (and bills) on demand.
My plan is to point the webhook to a Lambda function, but then the question is what to have the Lambda function do.
Tho it doesn't take very long, this task requires several different runtimes (Powershell Core, Python, PHP, Git) to get its job done, so Lambda isn't really a possibility as I'd hit the deployment package size limit. But I can use Lambda to kick off the job.
What I started doing was creating an AMI that has all the necessary runtimes and code, then using a Spot request to launch an instance, have it execute the operation via a startup script passed in via userdata, then shut itself down when it's done. I'd have to put in some rate control logic to prevent two from running at once, but that's a solvable problem.
I hesitated half way through developing this solution when I realized I could probably do this with a docker container on ECS using Fargate.
I just don't know if there is any benefit of putting in the additional development time of switching to a docker container, when I am not a docker pro and already have the AMI configured. Plus ECS/Fargate is actually more expensive than just running a micro instance.
Are these any concerns about spinning up short-lived (<5min) spot requests (t3a-micro) where there could be a dozen fired off in a single day? Are there rate limits about this? Will I get an angry email from AWS telling me to knock it off? Are there other reasons ECS is the only right answer? Something else entirely?
Your solution using spot instance and AMI is a valid one, though I've experienced slow times to get a spot instance in the past. You also incur the AMI startup time.
As mentioned in the comments, you will incur a minimum of 1 hour charge for the instance, so you should leave your instance up for the hour before terminating, in case more requests can come in the same hour.
IMHO you should build it all with lambda. By splitting the workload for each runtime into its own lambda you can make it work.
AWS supports python, powershell runtimes, and you can create a custom PHP one. Chain them together with your glue of choice, SNS, SQS, direct invocation, or Step Functions, and you have the most cost effective solution. You also get the benefits of better and independent maintenance for each function/runtime.
Put the initial lambda behind API gateway and you will get rate limiting capabiltiy too.

What's the best method for creating a scheduler for running EC2 instances?

I want to create a web app for my organization where users can schedule in advance at what times they'd like their EC2 instances to start and stop (like creating events in a calendar), and those instances will be automatically started or stopped at those times. I've come across four different options:
AWS Datapipeline
Cron running on EC2 instance
Scheduled scaling of Auto Scaling Group
AWS Lambda scheduled events
It seems to me that I'll need a database to store the user's scheduled times for autostarting and autostopping an instance, and that I'll have to pull that data from the database regularly (to make sure that's the latest updated schedule). Which would be the best of the four above options for my use case?
Edit: Auto Scaling only seems to be for launching and terminating instances, so I can rule that out.
Simple!
Ask users to add a tag to their instance(s) indicating when they should start and stop (figure out some format so they can easily specify Mon-Fri or Every Day)
Create an AWS Lambda function that scans instances for their tags and starts/stops them based upon the tag content
Create an Amazon CloudWatch Event rule that triggers the Lambda function every 15 minutes (or whatever resolution you want)
You can probably find some sample code if you search for AWS Stopinator.
Take a look at ParkMyCloud if you're looking for an external SaaS app that can help your users easily schedule (or override that schedule) your EC2, RDS, and ASG instances. It also connects to SSO, provides an API, and shows you all of your resources across regions/accounts/clouds. There's a free trial available if you want to test it out.
Disclosure: I work for ParkMyCloud.

Is it possible to auto scale with amazon web services, with ever changing AMI's?

Curious if this is possible:
We have a web application that at MOST times, works just fine with our single small instance. However, when we get multiple customers running simultaneously intense queries (we are a cloud scheduling service); our instance bogs way down to near 80% cpu load and becomes pretty unresponsive.
Is there a way to have AWS fire up another small instance (or a few), quickly, only for the times that its operating under this intense load? BUT, the real question is how does this work when we have very frequent programming updates to our application? Do we have to manually create a new image everytime we upload a code change???
Thanks
You should never be running anything important on a single EC2 instance. Instances can--and do--go offline randomly. Always use an autoscaling (AS) group that spans multiple availability zones. An AS group will automatically bring new instances online when you hit a certain trigger (in your case, CPU utilization). And then it will scale down the instances when traffic subsides. Autoscaling is the heart and soul of AWS and if you're not using it, you might as well be using a cheaper (and more durable) VPS host.
No, you don't want to be creating a new AMI for each code release. Ideally you should use a base AMI (like one of Amazon's official ones) and then have it auto-provision at boot. You can use the "user data" field when you launch an AMI to bootstrap this process. It can be as simple as a bash script that pulls from your Git repo to as something as sophisticated as Puppet or Chef.
The only time I create custom AMI's is if the provisioning process just takes too long. However that can almost always be solved by storing the needed files in S3.

Increasing compute power temporarily on AWS

I have an Amazon EC2 Micro instance running using EBS storage. This more than meets my needs 99.9% of the time, however I need to perform a very intensive database operation as a once off which kills the Micro instance.
Is there a simple way to restart the exact same instance but with lots more power for a temporary period, and then revert back to the Micro instance when I'm done? I thought this seemed more than possible under the cloud based model Amazon uses but it doesn't appear to simply be a matter of shutting down and restarting with more power as I first thought it might be.
If you are manually running the database operation, then you can just create the image of the server, launch a small or a high cpu instance using the same image, run the database operation and then create the image and launch it again as a micro instance. You can also automate this process by writing scripts using AWS APIs.
In case you're using an EBS-backed AMI you don't have to create a new image and launch it. Just stop the machine and issue a simple EC2 API command to change the instance type:
ec2-modify-instance-attribute --instance-type <instance_type> <instance_id>
Keep in mind that not all instance types work for every AMI. The applicable instance types depend on the machine itself and the kernel. You can find a list of available instance types here: http://aws.amazon.com/ec2/instance-types/