I'm about to launch my new Cloud application which needs to run on multiple EC2 instances. How should I decide which EC2 instances I need to deploy? How much it depends on the workload? Thanks

If you are automating the deployment of your infrastructure, you should be able to set up testing infrastructure that you can use to run some load tests where you try to see what will happen with your "expected" production load. This can help identify potential bottlenecks - memory, cpu, IO - something will be the limiting factor on the performance of a single instance.
Then, if you're just about to launch a new application, overprovision - how much and how you accomplish that will depend on how critical it is, how much traffic you expect, what you think the limiting factor on performance might be, and probably a few other variables. If you determined that CPU might be the limiting factor, then launch with C-class instances, for memory then try R family, and if it's IO then maybe use EBS optimized or provisioned IOPS.
After you have a few days of stats, you can make more reasonable adjustments. Depending on the size of your infrastructure, ensuring you have enough performance at launch probably won't cost you more than a few bucks extra.

It all depends on your workload.
Start small (or take your best guess), automated everything, monitor loads and then scale up and down as needed.


ECS clarify on resources

I'm having trouble understanding the config definitions of a task.
I want to understand the resources. There are a few options (if we talk only about memory):
There are a few things I'm not sure about.
First of all, the docs say that when the hard limit is exceeded, the container will stop running. Isn't the goal of a container orchestration service to keep the service alive?
Root level memory must be greater than all containers memory. In theory I would imagine once there aren't enough containers deployed, new containers are created for the image. I wouldn't like to use more resources than I need, but if I reserve the memory on root level, first, I do reserve much more than needed, and second, if my application receives a huge load, the whole cluster will shut down if the memory limit is exceeded or what?
I want to implement a system that auto-scales, and I would imagine that this way I don't have to define resources allocated, it just uses the amount needed, and deploys/kills new containers if the load increases/decreases.
For me there are a lot of confusion around ECS, and Fargate, and how it works, how it scales, and the more I read about it, the more confusing it gets.
I would like to set the minimum amount of resources per container, at how much load to create a new container, and at how much load to kill one (because it's not needed anymore).
P.S. not experienced in devops in general, I used kubernetes at my company, and there are things I'm not clear about, just learning this ECS world.
I would say the goal of a container orchestration service is to deploy your containers, and restart them if they fail for some reason. A container orchestration service can't magically add RAM to a server as needed.
No, you always have to define the amount of RAM and CPU that you want to reserve for each of your Fargate tasks. Amazon charges you by the amount of RAM and CPU you reserve for your Fargate tasks, regardless of what your application actually uses, because Amazon is having to allocate physical hardware resources to your ECS Fargate task to ensure that much RAM and CPU are always available to your task.
Amazon can't add extra RAM or CPU to a running Fargate task just because it suddenly needs more. There will be other processes, of other AWS customers, running on the same physical server, and there is no guarantee that extra RAM or CPU are available on that server when you need it. That is why you have to allocate/reserve all the CPU and RAM resources your task will need at the time it is deployed.
You can configure autoscaling to trigger on the amount of RAM your tasks are using, to start more instances of your task, thus spreading the load across more tasks which should hopefully reduce the amount of RAM being used by each of your individual tasks. You have to realize each of those new Fargate task instances created by autoscaling are spinning up on different physical servers, and each one is reserving a specific amount of RAM on the server they are on.
You need to allocate the maximum amount of resources all the containers in your task will need, not the minimum. Because more physical resources can't be allocated to a single task at run time.
You would configure autoscaling with the target value, of for example 60% RAM usage, and it would automatically add more task instances if the average of the current instances exceeds 60%, and automatically start removing instances if the average of the current instances is well below 60%.

Would it be best to scale fewer larger instances, or more smaller instances?

what will be the best option to choose b/w less number of large instances or more number of the small instance when the performance is concerned, as the cloudwatch (load balancing and scaling) will be used if the traffic floods on the servers.
AWS is all about ELASTICITY
There is no need to provision large instances when not needed and burn out money.
There can be many instances when your CPU on one goes high and the next large instance you created remains under-utilized.
You should have medium instances to small w.r.t the tier you require (Memory Intensive, CPU, or Network) and scale those instances with properly written policies.
As long as the userdata, ami is stable you can spawn many instances within minutes making sure you are not spending way too much and saving every Penny.
This is heavily dependent on your application.
I agree with Faisal Nizam's intuition of favoring horizontal scaling. However, there are many applications that will not run very well on small instances.
For example, Elastic recommends to have Elasticsearch cluster nodes with 64GB of RAM. Similar reasoning can be applied to many other data related applications, where it can be beneficial if a single instance is able to keep large data chunks in memory.
I would recommend to find the ideal instance size for your application, and from there scale horizontally.
Each EC2 has also some overhead, so you need to find a balance between large & costly instances vs. a lot and small instances with overhead.
(As of today) To vertically scale up/scale down an EC2 server, it needs to be shut down and spun back up - something to keep in mind before deciding to go for it.

Choosing the right EC2 instance type?

I'm trying to determine if it makes sense to switch our hosting to EC2 from a dedicated dreamhost server, and if so, what EC2 instance type I should choose to get a good idea of the cost prior to switching. I would like to go low and then bump up if need be.
Current Usage:
dedicated server with 4 GB RAM and 4 CPUs
average disk usage: 783 MB
average bandwidth: 8.5 GB
This is really all the info I get from our dreamhost control panel, so hopefully it's enough to provide some recommendations on where to start.
Using the calculator located here, I'm leaning towards a t2.xlarge. Is that too much? not enough?
It is not possible for anyone to recommend the 'correct' instance type. This is because it depends on the operation of your particular application. It might be CPU-intensive, RAM-intensive, network-heavy, highly parallel, etc.
Some applications might need to handle occasional spikes of traffic, whereas other applications might be relatively consistent in their load.
The correct way to determine your 'best' instance type is to run tests that simulate the expected application load. If you can create an automated test, then you could run it against many different instance types and compare the performance vs cost.
Also, many applications are designed to be able to run across multiple instances, so it would be better to test various quantities of servers as well as their instance type.
You might also consider using Amazon EC2 Auto Scaling, which gives the ability to automatically add/remove servers based upon workload. This means that you could use much more powerful instances, but automatically turn some of them off during less-used periods. This affects the cost calculation because the more-powerful instances are more expensive, but you won't be using them all the time.
Then, you could also consider using Amazon EC2 Spot Instances, which can be up to 90% less cost but might be terminated when the demand for such instances is higher. You can also combine On-Demand and Spot Instances to give additional capacity at a lower cost.
(Spot and Auto Scaling are only really applicable if you are using more than one instance to host your application.)
And finally, if your application only requires one instance, you could also consider using Amazon Lightsail that combines the price for instance type and network bandwidth to make the price more predictable.
Bottom line: It depends!
One final word: Most companies consider switching to AWS not purely on a cost basis ("if it makes sense to switch our hosting to EC2 from a dedicated dreamhost server"), but rather on the breadth of features that AWS offers that are not available in a traditional server hosting service. If all you need is "a server", it's probably easiest to consider Amazon LightSail or keep whatever is currently working for you. The cost saving with AWS won't be dramatic (or it might not even be cheaper!), but it will offer you a lot more capabilities if you ever grow beyond just requiring "a server".

How to get consistent CPU utilization on AWS

I've now learnt that when I start a new EC2 instance it has a certain number of CPU credits due to which it's performance is high when it starts processing but gradually reduces over time as the credits run out. Past that point, the instance runs at which appears to be the baseline CPU utilisation rate. To numerate, when I started the EC2 instance (t2.nano), Cloudwatch reported around 80% CPU utilisation gradually decreasing down to 5%.
Now I'm happy to use one of the better instance types pending the instance limit request. But whilst that is in progress, I'd like to know whether the issue of reducing performance over time will still hold even with the better instance type?
Would I require a dedicated host setup if I wish to ensure I get consistent CPU utilisation? The only problem I can see here is that I'm running a SQS worker queue and Elastic Beanstalk allows us to easily setup a worker environment which reads messages from the queue. From what I've read and from looking at the configuration options available in Elastic Beanstalk, I don't think I'll be able to launch instances into a dedicated host directly. Most of my reading has lead me to believe that I'll have to learn how to use a VPC. Would that be correct?
So I guess my questions are - would simply increasing the instance type to a more powerful instance guarantee consistent CPU utilisation performance or is a dedicated host required and if so, is it possible to set up one with Elastic Beanstalk or would it have to be setup manually and if it is set up manually can it be configured to work with an SQS queue automatically?
If you want consistent CPU performance, you should avoid the burstable performance instances (the T2 family). All other families of instances (M5, C5, etc) will have consistent CPU performance over time. You can use any instance family with Elastic Beanstalk. No need for a dedicated host.

Which aws instance type is optimal to improve spark shuffle performance?

For my spark application I'm trying to determine whether I should be using 10 r3.8xlarge or 40 r3.2xlarge. I'm mostly concerned with shuffle performance of the application.
If I go with r3.8xlarge I will need to configure 4 worker instances per machine to keep the JVM size down. The worker instances will likely contend with each other for network and disk I/O if they are on the same machine. If I go with 40 r3.2xlarge I will be able to allocate a single worker instance per box, allowing each worker instance to have its own dedicated network and disk I/O.
Since shuffle performance is heavily impacted by disk and network throughput, it seems like going with 40 r3.2xlarge would be the better configuration between the two. Is my analysis correct? Are there other tradeoffs that I'm not taking into account? Does spark bypass the network transfer and read straight from local disk if worker instances are on the same machine?
Seems you have the answer already : it seems like going with 40 r3.2xlarge would be the better configuration between the two.
Recommend you go through aws well architect.
