The core of my question is whether or not there are downsides to using an Amazon Machine Image + Micro Spot instances to run a task, vs using the Elastic Container Service (ECS).
Here's my situation: I have the need to run a task on demand that is triggered by a remote web hook.
There is the possibility this task can get triggered 10 times in a row, or go weeks w/o ever executing, so I definitely want a service that only runs (and bills) on demand.
My plan is to point the webhook to a Lambda function, but then the question is what to have the Lambda function do.
Tho it doesn't take very long, this task requires several different runtimes (Powershell Core, Python, PHP, Git) to get its job done, so Lambda isn't really a possibility as I'd hit the deployment package size limit. But I can use Lambda to kick off the job.
What I started doing was creating an AMI that has all the necessary runtimes and code, then using a Spot request to launch an instance, have it execute the operation via a startup script passed in via userdata, then shut itself down when it's done. I'd have to put in some rate control logic to prevent two from running at once, but that's a solvable problem.
I hesitated half way through developing this solution when I realized I could probably do this with a docker container on ECS using Fargate.
I just don't know if there is any benefit of putting in the additional development time of switching to a docker container, when I am not a docker pro and already have the AMI configured. Plus ECS/Fargate is actually more expensive than just running a micro instance.
Are these any concerns about spinning up short-lived (<5min) spot requests (t3a-micro) where there could be a dozen fired off in a single day? Are there rate limits about this? Will I get an angry email from AWS telling me to knock it off? Are there other reasons ECS is the only right answer? Something else entirely?
Your solution using spot instance and AMI is a valid one, though I've experienced slow times to get a spot instance in the past. You also incur the AMI startup time.
As mentioned in the comments, you will incur a minimum of 1 hour charge for the instance, so you should leave your instance up for the hour before terminating, in case more requests can come in the same hour.
IMHO you should build it all with lambda. By splitting the workload for each runtime into its own lambda you can make it work.
AWS supports python, powershell runtimes, and you can create a custom PHP one. Chain them together with your glue of choice, SNS, SQS, direct invocation, or Step Functions, and you have the most cost effective solution. You also get the benefits of better and independent maintenance for each function/runtime.
Put the initial lambda behind API gateway and you will get rate limiting capabiltiy too.
Related
I am currently running a video encoding application on ECS but auto scaling is my biggest problem.
Users start live video encoding jobs from a front end. Once a job is placed, this is added as a redis queue (rq) job that runs on an ECS task placed on a c5d.large instance using ffmpeg.
Autoscaling is currently based on alarms. If cpu is > than a set percentage, a new instance and task is spawned. If cpu is low, instances are checked and if no jobs are running they are destroyed.
This is not a bad solution but it feels clunky and slow. If a user wants to start two jobs one right after the other, it takes a couple of minutes for the instance to spawn + task to be placed (even using warm groups).
Plus cloudwatch alarms take a while to refresh and are not a super reliable way of defining work that is being done (a video encoding at 720p will use less cpu than one at 1080p and thus mess all my alarm settings).
Is there a better solution that someone can guide me to that allows for fast and precise autoscaling other than relying on cloudwatch alarms? I am tempted to try to create my own autoscaling system based on current executing jobs / workers and spawn/destroy instances directly calling the API from my code, but I'm hoping to find a better solution directly from within AWS.
Thanks
I too have this exact problem, AWS already has mediaconvert/elastictranscoder but it's just too expensive & I decided to create my own firstly on lambda with SST.dev (serverless) where all jobs are a single function invocation but I had issues with 15mins function timeout mostly because I'm not copying codecs.
scaling at this point I would think is Kubernetes. This is the sort of problem that Kubernetes is intended to handle (dynamic resource scaling on demand). Kubernetes is rather non-trivial. K8s is what the industry has settled on for the most part, so there are probably a lot of reasons to just go that route. You could start with K3S (psst! i just knew that today) and move up to K8s when you are ready.
Since you're trying to find a solution directly from within AWS, you can try EKS but I'm not completely sure what the best would be.
I am new to AWS and recently set up a free t3.micro instance. My goal is to achieve a stable hosting of an Angular application with 2 spring boot services. I got everything working, but after a while, the spring boot services are not reachable anymore. When i redeploy the service it will run again. The spring boot services are packed as jar and after the deployment the process is started as a java process.
I thought AWS guarantees permanent availability out of the box. Do i need some more setup such as autoscaling to achieve the desired uptime of the services or is the t3.micro instance not suffienciently performant, so that i need to upgrade to a stronger instance to avoid the problem?
It depends :)
I think you did the right thing by starting with a small instance type and avoid over provisioning in the first place. T3 instance types are generally beneficial for 'burst' usage scenarios i.e. your application sporadically needs a compute spike but not a persistent one. T3 instance types usually work with credits based system, where you instance 'earns' credits when it is idle, and that buffer is always available in times of need (but only until consumed entirely). Then you need to wait for some time window again and earn the credits back.
For your current problem, I think first approach can be to get an idea of the current usage by going through the 'Monitoring' tab on the EC2 instance details page. This will help you understand if the needs are more compute related or i/o related and then you can choose an appropriate instance type from :
https://aws.amazon.com/ec2/instance-types
Next step could also be to profile your application and understand the memory, compute utilisation better. AWS does guarantee availability/durability of resources, but how you consume those resources is more of an application thing, which AWS does not guarantee/control
For your ideas around, autoscaling and availability, it again depends on what your needs are in terms of cost, outages in AWS data centres etc. To have a reliable production setup, you could consider them, but not something really important in the first place.
I am curious to know what does the model.deploy command actually does in the background when implemented in aws sagemaker notebook
for eg :
predictor = sagemaker_model.deploy(initial_instance_count=9,instance_type='ml.c5.xlarge')
and also at the time of sagemaker endpoint autoscaling what is happening in the background, it is taking to long almost 10 minutes to launch a new-instances, by which most of the requests get dropped or not processed and also getting connection timeout while load testing threw JMeter. Is there any way to fast bootup or golden AMI kind of thing in sagemaker?
are there any other means by which this issue can be solved?
The docs mention what the deploy method does: https://sagemaker.readthedocs.io/en/stable/model.html#sagemaker.model.Model.deploy
You could also take a look at the source code here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L377
Essentially the deploy method hosts your model on a SageMaker Endpoint, launching the number of instances using the instance type that you specify. You can then invoke your model using a predictor: https://sagemaker.readthedocs.io/en/stable/predictors.html
For autoscaling, you may want to consider lowering your threshold for scaling out so that the additional instances start to be launched earlier. This page offers some good advice on how to determine the RPS your endpoint can handle. Specifically, you may want to have a lower SAFETY_FACTOR to ensure new instances are provisioned in time to handle your expected traffic.
I am able to successfully deploy Django Celery worker as a docker container in AWS ECS service using FARGATE as computing.
But my concern is that celery container is running 24/7. If I can run container only when task is assigned, I can save lot of money as per AWS FARGATE billing methodology.
Celery isn't really the right thing to use because it's designed to persist, but the goal should be reasonably easy to achieve.
Architecturally, you probably want to run a script on a Fargate task. The script chews through the queue and then dies. You'd trigger that task somehow:
An API call from your data receiver (e.g. Django)
A lambda function (triggered by what?)
Still some open questions... do you limit yourself to one task at a time or do you need to manage concurrent requests to the queue? Do you retry? But a plausible place to start.
A not-recommended but perhaps easier way to do it would be to run a celery worker in your Django container (e.g. using supervisor) and use Fargate's autoscaling features. You'd always have the one Django container running to receive data. If the celery worker on that container used up all of the available resources, Fargate would scale the service by adding tasks. Once the jobs were done, it'd remove the excess containers. You'd be paying the "overhead" for Django in each container, but it could cost you less than an always-on celery container and would certainly be simpler -- leverage your celery experience and avoid the extra layer of event handling.
EDIT: Another disadvantage of this version is that you need to run Redis somewhere and I've found the minimum cost for this to be relatively high.
Based on my growing AWS experience, here's what you probably should do...
Use AWS API Gateway as an always-on receiver of events/requests. You only pay for requests, the free tier includes a million per month, and the next 300M are $1 (pricing) so this is likely to be free.
While you have many options for responding to the request, an AWS Lambda function (which can be written in python) should have the least overhead.
If your queue will run longer than a Lambda function allows (15 minutes), you'll need to have that Lambda function delegate the processing to e.g. a Fargate task.
(Optional) If you want to user a Dockerhub container for your Fargate task, we experienced a bunch of issues with Tasks and Services failing to start due to rate limits at Dockerhub. We ended up wrapping our Fargate task in a Step Function that checked for this error specifically and retried.
(Optional) If you need to limit concurrency, this SO answer suggests having your Lambda function check for an existing execution (of a Step Function or Fargate task). I was hoping there was something native on Fargate Tasks or Step Functions but I don't see anything.
I imagine this would represent a huge operating cost savings over the always-on Fargate task and Elasticache Redis queue, but the up-front cost/hassle could exceed the savings.
Have you thought of using AWS Lambda instead of the celery worker? You would then pay per task execution, where cost is driven by execution time and memory usage. If you have an application which is mostly idle then paying per request, skipping the idle cost, would make the most sense.
Curious if this is possible:
We have a web application that at MOST times, works just fine with our single small instance. However, when we get multiple customers running simultaneously intense queries (we are a cloud scheduling service); our instance bogs way down to near 80% cpu load and becomes pretty unresponsive.
Is there a way to have AWS fire up another small instance (or a few), quickly, only for the times that its operating under this intense load? BUT, the real question is how does this work when we have very frequent programming updates to our application? Do we have to manually create a new image everytime we upload a code change???
Thanks
You should never be running anything important on a single EC2 instance. Instances can--and do--go offline randomly. Always use an autoscaling (AS) group that spans multiple availability zones. An AS group will automatically bring new instances online when you hit a certain trigger (in your case, CPU utilization). And then it will scale down the instances when traffic subsides. Autoscaling is the heart and soul of AWS and if you're not using it, you might as well be using a cheaper (and more durable) VPS host.
No, you don't want to be creating a new AMI for each code release. Ideally you should use a base AMI (like one of Amazon's official ones) and then have it auto-provision at boot. You can use the "user data" field when you launch an AMI to bootstrap this process. It can be as simple as a bash script that pulls from your Git repo to as something as sophisticated as Puppet or Chef.
The only time I create custom AMI's is if the provisioning process just takes too long. However that can almost always be solved by storing the needed files in S3.