Alternative for built-in autoscaling groups for spot instances on AWS

Alternative for built-in autoscaling groups for spot instances on AWS - amazon-web-services

I am currently using spot instances managed with auto-scaling groups. However, ASG has a number of shortcomings for use with spot instances. For example, it cannot launch instances of a different instance type if the current type is experiencing a price spike across all availability zones. It can't even re-distribute the number of running instances across zones (if one zone has a price spike, you're down 30% in the number of running instances.)
Are there any software solutions that I could run which would replace built-in AWS Auto-Scaling Groups? I've heard of SpotInst and Batchly, but I do not trust them. Basically, I think their business plan involves being bought out and killed by Amazon, like what happened to ClusterK. The evidence for this is the bizarre pricing policies and other red flags. I need something that I can self-host and depend on.

AWS recently released Auto Scaling for Spot Fleets which seems to fit your use case pretty well. You can define the cluster capacity in terms of vCPU that you need, choose the instance types you'd like to use and their weights and let AWS manage the rest.
They will provision spot instances at their current market price up to a limit you can define per instance type (as before), but integrating Auto Scaling capabilities.
You can find more information here.
https://aws.amazon.com/blogs/aws/new-auto-scaling-for-ec2-spot-fleets/

It's unlikely that you're going to find something that takes into account everything you want. But because everything in Amazon is an API, so you can write that yourself. There are lots of ways to do that.
For example, you could write a small script (bash, ruby, python etc) that shells out the AWS CLI to get the price, then shells out to launch boxes. For bonus points, use the native AWS SDK library instead of shelling out. (That will be slightly easier to handle errors, etc.) For even more bonus points, open source it, and hope that other people to improve on it!
This script can run on your home computer, or on a t1.micro for $5/month. Or you could write it in node.js, and run it on Lambda for pennies per month.

Here at Spotinst, these are exactly the problems we built Elastigroup to solve.
Elastigroup enables running simultaneously on as many instance types and availability zones (within a region) as you’d like. This is coupled with several things to maintain production availability:
Our algorithm makes live choices for the best Spot markets in terms of price and availability.
When an interruption happens, we predict it about 15 minutes in advance and take all the necessary steps to ensure (and insure) the capacity of your group.
In the extreme case that none of the markets have Spot availability, we simply fall back to an on-demand instance.
We have a great relationship with AWS and work closely with both their technical and business teams to provide our joined customers with the best experience possible. As we manage resources inside your own AWS account, I wouldn’t put the relationship between us as a concern, to begin with.

Related

How do you implement cloud solutions without incurring costs during development?

I am completely new to the implementation of cloud solutions. I've just started taking AWS training courses.
But I already have a very fundamental question about the flow of development in cloud projects:
How do you go about developing solutions without incurring costs? I know that there are free tiers, but in practice you need a lot of unfree elements. Especially when working with infrastructure-as-code approaches (e.g. CloudFormation), it can happen that every time you try out the templates, costs can be incurred immediately.
Is there maybe something like a sandbox mode or how else do you go about it in practice?

Outside of the AWS Free Tier you will be billed for creating services.
The best way to keep costs as low as possible is to combing the lowest priced settings (such as instance class) with removing resources you're not using after you're complete. I understand that this will cost, however many resources are now moving to per second billing (where you normally have to pay for at least the first minute) so the cost is kept low.
Additionally when dealing with some services (such as EC2, ECS, Fargate and ECR) you can make use of spot instances to pay sometimes as low as 10% of the original cost which will help to reduce these resources.
To ensure you can recreate resources when you want them use infrastructure as code to reroll out as you need the resources (CloudFormation or Terraform are great offerings for this).
Finally be on the lookout for AWS conferences, they are a great way to pickup AWS credits for attending which will offset your bill against most AWS services.

I want AWS Spot pricing for a long-running job. Is a spot request of one instance the best way to achieve this?

I have a multi-day analysis problem that I am running on a 72 cpu c5n EC2 instance. To get spot pricing, I made my code interruption-resilient and am launching a spot request of one instance. It works great, but this seems like overkill given that Spot can handle thousands of instances. Is this the correct way to solve my problem or am I using a sledgehammer to squash a fly?
I've tried normal EC2 launching, which works great, except that it is four times the price. I don't know of any other way to approach this except for these two ways. I thought about Fargate or containers or something, but I am running a 72 cpu c5n node, and those other options won't let me use that kind of horsepower (that I know of, hence my question).
Thanks!

Amazon EC2 Spot Instances are an excellent way to get cheaper compute (up to 90% discount). The only downside is that the instances might be stopped/terminated (your choice) if there is insufficient capacity.
Some strategies to improve your chance of obtaining spot instances:
Use instances across different Instance Types and Availability Zones because they each have different availability pools (EC2 Spot Fleet can assist with this)
Use resources on weekends and in evenings (even in different regions!) because these tend to be times of lower usage
Use Spot Instances with a specified duration (also known as Spot blocks), but this is at a higher price and a maximum duration of 6 hours
If your software permits it, you could split your load between multiple instances to get the job done faster and to be more resilient against any stoppages of your Spot instances.
Hopefully your application is taking advantage of all the CPUs, otherwise you'd be better-off with smaller instances.

Choosing the right EC2 instance type?

I'm trying to determine if it makes sense to switch our hosting to EC2 from a dedicated dreamhost server, and if so, what EC2 instance type I should choose to get a good idea of the cost prior to switching. I would like to go low and then bump up if need be.
Current Usage:
dedicated server with 4 GB RAM and 4 CPUs
average disk usage: 783 MB
average bandwidth: 8.5 GB
This is really all the info I get from our dreamhost control panel, so hopefully it's enough to provide some recommendations on where to start.
Using the calculator located here, I'm leaning towards a t2.xlarge. Is that too much? not enough?

It is not possible for anyone to recommend the 'correct' instance type. This is because it depends on the operation of your particular application. It might be CPU-intensive, RAM-intensive, network-heavy, highly parallel, etc.
Some applications might need to handle occasional spikes of traffic, whereas other applications might be relatively consistent in their load.
The correct way to determine your 'best' instance type is to run tests that simulate the expected application load. If you can create an automated test, then you could run it against many different instance types and compare the performance vs cost.
Also, many applications are designed to be able to run across multiple instances, so it would be better to test various quantities of servers as well as their instance type.
You might also consider using Amazon EC2 Auto Scaling, which gives the ability to automatically add/remove servers based upon workload. This means that you could use much more powerful instances, but automatically turn some of them off during less-used periods. This affects the cost calculation because the more-powerful instances are more expensive, but you won't be using them all the time.
Then, you could also consider using Amazon EC2 Spot Instances, which can be up to 90% less cost but might be terminated when the demand for such instances is higher. You can also combine On-Demand and Spot Instances to give additional capacity at a lower cost.
(Spot and Auto Scaling are only really applicable if you are using more than one instance to host your application.)
And finally, if your application only requires one instance, you could also consider using Amazon Lightsail that combines the price for instance type and network bandwidth to make the price more predictable.
Bottom line: It depends!
One final word: Most companies consider switching to AWS not purely on a cost basis ("if it makes sense to switch our hosting to EC2 from a dedicated dreamhost server"), but rather on the breadth of features that AWS offers that are not available in a traditional server hosting service. If all you need is "a server", it's probably easiest to consider Amazon LightSail or keep whatever is currently working for you. The cost saving with AWS won't be dramatic (or it might not even be cheaper!), but it will offer you a lot more capabilities if you ever grow beyond just requiring "a server".

AWS vs GCP Cost Model

I need to make a cost model for AWS vs GCP. Currently, our organization is using AWS. Our biggest services used are:
EC2
RDS
Labda
AWS Gateway
S3
Elasticache
Cloudfront
Kinesis
I have very limited knowledge of cloud platforms. However, I have access to:
AWS Simple Monthly Calculator
Google Cloud Platform Pricing Calculator
MAP AWS services to GCP products
I also have access to CloudHealth so that I can get a breakdown of costs per services within our organization.
Of the 8 major services listed above are main usage and costs go to EC2, S3, and RDS.
Our director of engineering mentioned that I should be most concerned with vCPU and memory.
I would appreciate any insight (big or small) that people have into how I can go about creating this model, any other factors I should consider, which functionalities of the two providers for the services are considered historically "better" or cheaper, etc.
Thanks in advance, and any questions people may have, I am more than happy to answer.
-M

You should certainly cost-optimize your resources. It's so easy to create cloud resources that people don't always think about turning things off or right-sizing them.
Looking at your Top 5...
Amazon EC2
The simplest way to save money with Amazon EC2 is to turn off unused resources. You can even stop instances overnight and on the weekend. If they are only used 8 hours per workday, then that is only 40 out of 168 hours, so you can save 75% by turning them off when unused! For example, Dev and Test instances. People have written various types of automated utilities to turn instances on and off based on tags. Try search the Internet for AWS Stopinator.
Another way to save money on Amazon EC2 is to use spot instances. They are a fraction of the price, but have a risk that they might be turned off when demand increases. They are great where it is okay for systems to be terminated sometimes, such as automated testing systems. They are also a great way to supplement existing capacity at a fraction of the price.
If you definitely need the Amazon EC2 instances to keep running all the time, purchase Amazon EC2 Reserved Instances, which also offer a price saving.
Chat with your AWS Account Manager for help with the above options.
Amazon Relational Database Service (RDS)
Again, Amazon RDS instances can be stopped overnight/on weekends and turned on again when needed. You only pay while the instance is running (plus storage costs).
Examine the CloudWatch metrics for your RDS instances and determine whether they can be downsized without impacting applications. You can even resize them when they are used less (eg over weekends). Everything can be scripted, so you could trigger such downsizing and upsizing on a schedule.
Also look at the Engine used with RDS. Commercial offerings such as Oracle and Microsoft SQL Server are more expensive than open-source offerings like MySQL and PostgreSQL. Yes, your applications might need some changes, but the cost savings can be significant.
AWS Lambda
It is most unusual that Lambda is #3 in your list. In fact, some customers never get a charge for Lambda because it falls in the monthly free usage tier. Having high charges means you're making good use of Lambda (which is saving you EC2 costs), but take a look at which applications are using it the most and see whether they are using it wisely.
When correctly used, a Lambda function should only ever run for a few seconds, so check whether any application seem to be using it outside this pattern.
AWS API Gateway
Once again, these costs tend to be low ($3.50/million calls) so again I'd recommend trying to figure out how this is being used. If you really need that many calls, it would also explain the high Lambda costs. It would probably be more expensive if you were providing such functionality via Amazon EC2.
Amazon S3
Consider using different Storage Classes to reduce your costs. Costs can be reduced by:
Moving infrequently-accessed data to a different storage class
Moving data to One-Zone (if you have a copy of the data elsewhere, so don't need the redundancy)
Archiving infrequently-accessed data to Amazon Glacier, which offers much cheaper storage but does not have instant access

With GCP, you can benefit by receiving discounts such as the Committed Use Discount and the Sustained Use Discount.
With a Committed Use Discount, you can receive a discount of up to 70% if your usage is predictable.
With the Sustained Use Discount, there is an incremental discount if you reach certain usage thresholds.
On your concern with vCPU and memory, you may use predefined machine types. They are cheaper than custom machine types.
Lastly, you can also test the charges by trying out the Google Cloud Platform Free Tier.

finding best deployment locations in aws regions

Given we are on aws platform we need to subscribe to different sources of data, which are located around the world. How can we efficiently determine what is the region with lowest latency to some target IP (not our browser)?
There is a service called cloudping which pings from your current browser to aws regions, but this cannot be useful for obvious reasons.
Is there any tool similar to cloudping that such that we could specify what ip we want to ping to?
And a secondary question. I suppose it is possible to spawn instances using aws console api, does amazon have some significant fees if i have a script that spawns a compute instance does some short work and terminates it and does this for every single region?
Worst case we could spawn instances on all regions for short amount of time and ping to all destinations we are interested but that would be a lot of work for something rather simple... My assumption is that even within one region you might end up with some instances having significantly better latency than others, a script could spawn instances until the best one is found and terminates others...
UPDATE
It seems it rather easy to spawn instances and execute commands in them, shouldnt be hard to terminate them as well. Here is a good tool for this, now the question is will aws punish me with bills and isn't there already solution for this?

You can certainly launch and terminate Amazon EC2 instances all any region you wish. Amazon will not "punish" you -- the system will simply charge the normal cost for resources you use.
If you launch an Amazon EC2 instance with the Amazon Linux AMI, then the instance will be charged per-second, so the cost will be very low. For example, you could use a t2.micro instance for a few cents per hour (charged per second).
You could then run your own timing test from each region. However, you could probably predict the best performance simply based upon the location of the region (US East, US West, Frankfurt, Sydney, etc).
Also, please note that Ping is not a reliable measure for how your actual application would perform. To obtain the best measure, you should run an application in each region that connects to the 'source of data' you are trying to use. Measure performance as it would be used by your actual application. You might find that the remote service has higher latency than the network, meaning that location would only have a minor impact on performance.
If you use somebody else's timing or somebody else's tool, it will not be as accurate as measuring your actual application doing "real" work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js