I want AWS Spot pricing for a long-running job. Is a spot request of one instance the best way to achieve this? - amazon-web-services

I have a multi-day analysis problem that I am running on a 72 cpu c5n EC2 instance. To get spot pricing, I made my code interruption-resilient and am launching a spot request of one instance. It works great, but this seems like overkill given that Spot can handle thousands of instances. Is this the correct way to solve my problem or am I using a sledgehammer to squash a fly?
I've tried normal EC2 launching, which works great, except that it is four times the price. I don't know of any other way to approach this except for these two ways. I thought about Fargate or containers or something, but I am running a 72 cpu c5n node, and those other options won't let me use that kind of horsepower (that I know of, hence my question).
Thanks!

Amazon EC2 Spot Instances are an excellent way to get cheaper compute (up to 90% discount). The only downside is that the instances might be stopped/terminated (your choice) if there is insufficient capacity.
Some strategies to improve your chance of obtaining spot instances:
Use instances across different Instance Types and Availability Zones because they each have different availability pools (EC2 Spot Fleet can assist with this)
Use resources on weekends and in evenings (even in different regions!) because these tend to be times of lower usage
Use Spot Instances with a specified duration (also known as Spot blocks), but this is at a higher price and a maximum duration of 6 hours
If your software permits it, you could split your load between multiple instances to get the job done faster and to be more resilient against any stoppages of your Spot instances.
Hopefully your application is taking advantage of all the CPUs, otherwise you'd be better-off with smaller instances.

Related

Optimal bidding price for AWS EC2 spot block instances

I need to attach a fixed number of spot block instances as core nodes to the EMR cluster at my job. The reason we're going with spot block instances is because our Spark jobs are pretty much deterministic in terms of execution time. I'm using the boto3 EMR client apis for spawning and killing EMRs. The only unknown part for me is how the bidding happens for spot blocks. AWS docs have a price chart here for those instance types, but I can't find any information or apis for the accessing the bidding prices, similar to the ones present for normal spot instances.
The end goal is to find out the optimal bidding price, but I don't have any info rather than the static price chart. For the time being, I've set the bidding price to be 70% of the on-demand price using BidPriceAsPercentageOfOnDemandPrice. Any help is appreciated.
I haven't seen any data feeds for Spot Blocks. I suspect they are heavily dependent upon the current workloads and probably aren't used as much as normal Spot instances. The price would also vary based upon duration.
Please note that, at the end of a Spot Block period, the instances are terminated.
An alternative is to use normal Spot instances, but include a mix of instance types to reduce the likelihood of losing all of them.
These days, Spot instance can be terminated if capacity is reduced, even if the Spot price doesn't increase. This has resulted in a much smoother Spot price, but there is no guarantee of spot capacity even at the current Spot price.
Since Spot Blocks are more expensive than normal Spot instances, I'd suggest simply going for normal Spot on a couple of different instance types.

Would it be best to scale fewer larger instances, or more smaller instances?

what will be the best option to choose b/w less number of large instances or more number of the small instance when the performance is concerned, as the cloudwatch (load balancing and scaling) will be used if the traffic floods on the servers.
AWS is all about ELASTICITY
There is no need to provision large instances when not needed and burn out money.
There can be many instances when your CPU on one goes high and the next large instance you created remains under-utilized.
You should have medium instances to small w.r.t the tier you require (Memory Intensive, CPU, or Network) and scale those instances with properly written policies.
As long as the userdata, ami is stable you can spawn many instances within minutes making sure you are not spending way too much and saving every Penny.
SCALE WHEN NEEDED HORIZONTALLY
This is heavily dependent on your application.
I agree with Faisal Nizam's intuition of favoring horizontal scaling. However, there are many applications that will not run very well on small instances.
For example, Elastic recommends to have Elasticsearch cluster nodes with 64GB of RAM. Similar reasoning can be applied to many other data related applications, where it can be beneficial if a single instance is able to keep large data chunks in memory.
I would recommend to find the ideal instance size for your application, and from there scale horizontally.
Each EC2 has also some overhead, so you need to find a balance between large & costly instances vs. a lot and small instances with overhead.
(As of today) To vertically scale up/scale down an EC2 server, it needs to be shut down and spun back up - something to keep in mind before deciding to go for it.

multiple micro vs. one large ec2 instance

Our website is getting slow and we are in need of an upgrade.
We are currently AWS and have 1 micro ec2 instance that proved effective while our website had less traffic. Now when we get more traffic, our site is getting slower.
We can't seem to settle an argument.
Which would be better:
Adding multiple additional micro/small instances and have them managed either by nginx or amazon cloud computing
OR
Upgrading our micro instance into a large/xlarge instance.
which would be more effective considering the tasks to be performed by the server are simple, and considering the total amount of ram and processing power is similar. 1 big, or many small?
Thanks
Tough to say -
Option #2 is going to be the easiest to do, turn your server off, resize it, turn it back on get more capacity just by paying more money. Easy to do, but maybe not the best long-term solution. What will you do when traffic continues to increase (either constantly or at certain times) and there are no more gains to be had simply by picking a bigger box?
Option #1 is going to be more work, but ultimately maybe a better strategy.
First of all, you didn't say if you have a constant need for more throughput, or if it is certain times of the day/week/month/year when the capacity is needed - if that is the case, multiple EC2 instances with auto-scale groups setup to respond to increases and decreases in demand by turning on additional instances as needed and then turning them off as demand decreases is a cost-effective option.
In addition, having multiple instances running - preferable in different availability zones, gives you fault-tolerance - when your big instance in #1 goes down, your website is down - if you have many small instances running across 2 or 3 availability zones, you can continue to function if one or more or your instances goes down, and even if AWS availability zone goes offline (rare, but it happens).
Besides the options above, without knowing anything about your application - other things you can do - move some static assets to S3 and/or use AWS cloudfront (or other CDN) to offload some of the work - this is often a cheap and easy way to get more out of an existing box.

Alternative for built-in autoscaling groups for spot instances on AWS

I am currently using spot instances managed with auto-scaling groups. However, ASG has a number of shortcomings for use with spot instances. For example, it cannot launch instances of a different instance type if the current type is experiencing a price spike across all availability zones. It can't even re-distribute the number of running instances across zones (if one zone has a price spike, you're down 30% in the number of running instances.)
Are there any software solutions that I could run which would replace built-in AWS Auto-Scaling Groups? I've heard of SpotInst and Batchly, but I do not trust them. Basically, I think their business plan involves being bought out and killed by Amazon, like what happened to ClusterK. The evidence for this is the bizarre pricing policies and other red flags. I need something that I can self-host and depend on.
AWS recently released Auto Scaling for Spot Fleets which seems to fit your use case pretty well. You can define the cluster capacity in terms of vCPU that you need, choose the instance types you'd like to use and their weights and let AWS manage the rest.
They will provision spot instances at their current market price up to a limit you can define per instance type (as before), but integrating Auto Scaling capabilities.
You can find more information here.
https://aws.amazon.com/blogs/aws/new-auto-scaling-for-ec2-spot-fleets/
It's unlikely that you're going to find something that takes into account everything you want. But because everything in Amazon is an API, so you can write that yourself. There are lots of ways to do that.
For example, you could write a small script (bash, ruby, python etc) that shells out the AWS CLI to get the price, then shells out to launch boxes. For bonus points, use the native AWS SDK library instead of shelling out. (That will be slightly easier to handle errors, etc.) For even more bonus points, open source it, and hope that other people to improve on it!
This script can run on your home computer, or on a t1.micro for $5/month. Or you could write it in node.js, and run it on Lambda for pennies per month.
Here at Spotinst, these are exactly the problems we built Elastigroup to solve.
Elastigroup enables running simultaneously on as many instance types and availability zones (within a region) as you’d like. This is coupled with several things to maintain production availability:
Our algorithm makes live choices for the best Spot markets in terms of price and availability.
When an interruption happens, we predict it about 15 minutes in advance and take all the necessary steps to ensure (and insure) the capacity of your group.
In the extreme case that none of the markets have Spot availability, we simply fall back to an on-demand instance.
We have a great relationship with AWS and work closely with both their technical and business teams to provide our joined customers with the best experience possible. As we manage resources inside your own AWS account, I wouldn’t put the relationship between us as a concern, to begin with.

AWS EC2 Instances with Load Balancing during very high traffic

Our website is available throughout the year, and can handle traffic quite well with an AWS EC2 medium instance type. Every now and then (once a month), we get really heavy traffic though, and might need several extra large instances. We know when this will occur, and so we can start up instances in advance.
I have just noticed that we would save quite a bit of money when pre-purchasing a medium reserved instance, compared to our current on-demand instance. The problem is that such a reserved instance would mean that our master will be fixed at a medium instance type.
My question is this: Would there be any issues having such a small master, when we need to start-up new x-large slaves? What advantages would there be to keeping the master as an on-demand instance?
Reserved instances are there to save money for those instances you regularly run. I would suggest using reserved instances for your 'master'. The only advantage of keeping that one on-demand is the advantage that you could scale up or down as soon as your constant flow of traffic changes up or down. Make sure you choose the right use for your reserved instance; an 'always-on reserved instance' should have an heavy-use reserved purchase. Those 'peak instances' do best as an on-demand instance.
Just because you have a reservation does not mean you need to be running it all the time. The reservation will apply to any instance matching its parameters.
There are different reservation options as well depending how heavily you use a given instance type. You can take advantage of reserved instances and continue to do what you are doing, you don't need to switch anything.