What is the recommended EC2 instance for Istio bookinfo sample application? - amazon-web-services

I have EKS cluster on AWS with istio installed, the first time i installed istio, i used one m3.large EC2 instance and i got some istio services pending, ingress-gateway pods status was showing pending .
I described the pod and i saw error of insufficient CPU.... I increased the EC2 instance to m5.large and every pods started running..
We are actually on staging and this is not live yet, we are spending almost times 3 of our initial cost.
Can someone please recommend an EC2 instance that can conveniently get istio up and running, lets take a look at the bookinfo sample application.
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m33s (x60 over 12m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
It seems provisioning 2 m5.large instances worked perfectly, but this is incurring more cost.. Each m5.large cost 0.107 USD / hours and that is 77 USD / month .
Having two m5.large instance will encure more cost just to run 15 pods (5 custom pods)
Non-terminated Pods: (15 in total)

The deployment is made up of a different number of components. Some of
them, as pilot, have a large impact in terms of memory and CPU, so it
is recommended to have around 8GB of memory and 4 CPUs free in your
cluster. Obviously, all components have requested resources defined,
so if you don’t have enough capacity you will see pods not starting.
Where you are using M5-large which spec is
m5.large 2 CPU 8 Memory EBS-Only
so in the base of above requirement, you need
m5.xlarge 4 CPU 16 Memory EBS-Only
If your application is need high computing then you may try with compute optmized instance.
Compute optimized instances are ideal for compute-bound applications
that benefit from high-performance processors. They are well suited
for the following applications:
Batch processing workloads
Media transcoding
High-performance web servers
High-performance computing (HPC)
Scientific modeling
Dedicated gaming servers and ad serving engines
Machine learning inference and other compute-intensive applications
compute-optimized-instances
deploying-istio on AWS and azure recommendation
might help you
https://aws.amazon.com/blogs/opensource/getting-started-istio-eks/

If you look at the AWS instance types listing an m5.large instance is pretty small: it only has 2 CPU cores. On the other hand, if you look at the kubectl get pods --all-namespaces listing, you can see there are quite a few pods involved to run the core Kubernetes system (and several of those are replicated on each node in a multi-node installation).
If 2 cores isn't enough, you can try picking larger instance sizes; if 2x m5.large works then 1x m5.2xlarge will be slightly better and the same cost. If you're just running demo applications like this then the "c" family has half the memory (2 GiB per core) and is slightly cheaper so you might try a c5.2xlarge.
For medium-sized workloads, I'd suggest figuring out your total cluster requirements (based on either pods' resource requests or actual statistics from a tool like Prometheus); dividing that across some number of worker nodes, such that losing one won't be a significant problem (maybe 7 or 9); then selecting the instance size that fits that. It will be easier to run on fewer, larger nodes than more, smaller nodes (there are more places to fit that one pod that requires 8 GB of RAM).
(I routinely need to allocate 4-8 GB of memory for desktop environments like Docker Desktop for Mac or kind and still find it cramped; CPU isn't usually my limitation but I could easily believe that 2 cores and 8 GiB of RAM isn't enough.)
(And yes, AWS is pretty expensive for personal projects without an obvious revenue stream attached to them. You could get that m5.large instance for about $500/year if you were willing to pay that amount up front but that can still be a lot of money to just play around with things.)

TL;DR for many requirements the default requests in Istio are extremely greedy. You need to change these with your own values.yaml (assuming you're using Helm) and monitor how much resource Istio is actually using. Using bigger and bigger instance types is a bad solution (unless you really do consume the default requests, or you like spraying money against a wall).
The problem is that Istio, when using the default profiles, makes some very large Requests. This means that even if you've got plenty of available resources, kubernetes will refuse to schedule many of the Istio control plane components.
[I'm assuming you're famililar with kubernetes requests. If not, these are declarations in the pod yaml that "this pods need x cpu and y memory to run comfortably". The Kubernetes pod scheduler will then ensure that pod is scheduled to a node that has sufficient resource. The problem is, many people stick their finger in the air and put massive values in "to be sure". But this means that huge chunks of your available resource are being wasted, if the pod doesn't actually need that resource to be comfortable].
In addition, each sidecar makes a sizeable Request as well, piling on the pressure.
This will be why you're seeing pods stuck in pending.
I'm not 100% convinced that the default requests set by the Istio team are actually that reasonable [edit: for bookinfo, they're certainly not. I suspect the defaults are set for even multithousand node estates]. I would recommend that before boosting your instance sizes (and therefore your costs), look into reducing the requests made by the Istio control and data plane.
If you then find your Istio components are being evicted often, then you've gone too far.
Example: using the supplied Helm values.yaml file here, we have for each sidecar:
requests:
cpu: 100m
memory: 128Mi
(Lines 155-157).
More worringly, the default memory request for Pilot is 2Gb! That means you're going to be giving away a massive chunk (or maybe the whole) of a Node. That's just for Pilot - the same store is true for Galley, Citadel, Telemetry, etc, etc, etc.
You need to monitor a running cluster and if you can determine that these values can be reduced. For example, I have a reasonably busy cluster (way more complicated than the wretched bookinfo), and metrics server is telling me Pilot's cpu is 8millicore(!) and memory 62Mi. So if I'd blindly stuck with the defaults, which most people do, I'd be wasting nearly 2Gb of memory and half a CPU.
See my output here: I stress this is from a long running, production standard cluster:
[ec2-user#ip-172-31-33-8 ~]$ kubectl top pod -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-citadel-546575dc4b-crnlj 1m 14Mi
istio-galley-6679f66459-4rlrk 19m 17Mi
istio-ingressgateway-b9f65784b-k64th 1m 22Mi
istio-pilot-67bfb94df4-j7vld 8m 62Mi
istio-policy-598b768ddc-cvs2b 5m 39Mi
istio-sidecar-injector-578bc4cc74-n5v6w 11m 7Mi
istio-telemetry-cd6fddc4b-lt8rl 27m 57Mi
prometheus-6ccfbc78c-w4dd6 25m 497Mi
A more readable guide to the defaults is here.. Run through the requests for the whole of the control plane and add up the required cpu and memory. It's a lot of resource.
This is hard work, but you need to sit down and work out what each component really needs, set up your own values.yaml and generate your own yaml for Istio. The demo yamls provided by Istio are not reasonable, especially for Mickey Mouse apps like bookinfo, which should be taken out the back door and put out of its misery. Bear in mind Istio was developed originally alongside massive multi thousand node clusters.

Related

what will happen if my virtual machine too slow

i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option

AWS EC2 Performance explanation

I have a REST API web server, built in .NetCore, that has data heavy APIs.
This is hosted on AWS EC2, I have noticed that the average response time for certain APIs are ~4 seconds and if I turn up the AWS-EC2 specs, the response time goes down to a few milliseconds. I guess this is expected, what I don't understand is that even when I load test the APIs on a lower end CPU, the server never crosses 50% utilization of memory/CPU. So what is the correct technical explanation that makes the APIs perform faster if the lower end CPU never reaches a 100% utilization of memory/CPU?
There is no simple answer, there are so many ec2 variations you need to first figure out what is slowing down your API.
When you 'turn up' your ec2 instance, you are getting some combination of more memory, faster cpu, faster disk and more network bandwidth - and we can't tell which one of those 'more' features are improving your performance. Different instance classes ar optimized for different problems.
It could be as simple as the better network bandwidth, or it could be that your application is disk-bound and the better instance you chose is optimized for i/O performance.
Depending on what feature your instance is lacking, it would help you decide which type of instance to upgrade to - or as you have found out, just upgrade to something 'bigger' and be happy with the performance (at the tradeoff of being more expensive).

How many threads/processes to create in an ECS task

A c5.2xlarge instance has 8 vCPU. If I run os.cpu_count() (Python) or std::thread::hardware_concurrency() (C++) they each report 8 on this instance. I assume the underlying hardware is probably a much bigger machine, but they are telling me what I have available to me, and that seems useful and correct.
However, if my ECS task requests only 2048 CPU (2 vCPU), then it will still get 8 from the above queries on a c5.2xlarge machine. My understanding is Docker is going to limit my task to only using "2 vCPU worth" of CPU, if other busy tasks are running. But it's letting me see the whole instance.
It seems like this would lead to tasks creating too many threads/processes.
For example, if I'm running 2048 CPU tasks on a c5.18xlarge instance, each task will think it has 72 cores available. They will all create way too many threads/processes overall; it will work but be inefficient.
What is the best practice here? Should programs somehow know their ECS task reservation? And create threads/processes according to that? That seems good except then you might be under-using an instance if it's not full of busy tasks. So I'm just not sure what's optimal there.
I guess the root issue is Docker is going to throttle the total amount of CPU used. But it cannot adjust the number of threads/processes you are using. And using too many or too few threads/processes is inefficient.
See discussion of cpu usage in ECS docs.
See also this long blog post: https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
There is a huge difference between virtualization technologies and containers. Having a clear understanding of these technologies will help. That being said an application should be configurable if you want to deploy it in different environments.
I would suggest creating an optional config which tells the application that it can only use certain number of cpu cores. If that value is not provided then it falls back to auto detect.
Once you have this option when defining ECS task you can provide this optional config, which will fix the problem you are facing.

AWS EC2 ECS - How many tasks should I place on a single instance?

At the moment, I have a single c4.large (3.75GB RAM, 2 vCPU) instance in my workers cluster, currently running 21 tasks for 16 services. These tasks range from image processing, to data transformation, most sending HTTP requests too. As you can see, the instance is quite well utilisated.
My question is, how do I know how many tasks to place on an instance? I am placing up to 8 tasks for a service, but I'm unsure as to whether this results in a speed increase, given they are using the same underlying instance. How do I find the optimal placement?
Should I put many chefs in my kitchen, or will just two get the food out to customers faster?
We typically run lots of smaller sized server in our clusters. Like 4-6 t2.small for our workers and place 6-7 tasks on each. The main reason for this is not to speed up processing but reduce the blast radius of servers going down.
We've seen it quite often for a server to simply fail an instance health check and AWS take it down. Having the workers spread out reduces the effect on the system.
I agree with the other people’s 80% rule. But you never want a single host for any kind of critical applications. If that goes down you’re screwed. I also think it’s better to use larger sized servers because of their increase network performance. You should look into a host with enhanced networking, especially because you say you have a lot of HTTP work.
Another thing to consider is disk I/O. If you are piling too many tasks on a host and there is a failure, it’s going to try to schedule those all somewhere else. I have had servers crash because of too many tasks being scheduled and burning through disk credits.

Running RabbitMQ+Celery in the same server as production environment

I'm running a Django app in an EC2 instance, which uses RabbitMQ + Celery for task queuing. Are there any drawbacks to running my RabbitMQ node from the same EC2 instance as my production app?
The answer to this questions really depends on the context of your application.
When you're faced with scenarios you should always consider a few things.
Seperation of concerns
Here, we want to make sure that if one of the systems are not responsible for the running of other systems. This includes things like
If the ec2 instance running all the stuff goes down, will the remaining tasks in queue continue running
if my RAM is full, will all systems remain functioning
Can I scale just one segment of my app without having to redesign infrastructure.
By having rabbit and django (with some kind of service, wsgi, gunicorn, waitress etc) all on one box, you loose a lot of resource contingency.
Although RAM and CPU may be abundant, there is a limit to IO, disk writes, network writes etc. This means that if for some reason you have a heavy write function, all other systems may suffer as a result. If you have a heavy write to RAM funciton, the same applies.
So really the downfalls from keeping things in one system that I can see from your question and my own experience are as follows.
Multiple points of failure. If your one instance of rabbit fails, your queues and tasks stop working.
If your app starts generating big traffic, other systems start to contend for recourses.
If any component goes down, that could mean other downtime of other services.
System downtime means complete downtime of all components.
Lots of headaches when your application demands more resources with minimal downtime.
Lots of web traffic will slow down task running
Lots of task running will slow down web requests
Lots of IO will slow down all the things
The rule of thumb that I usually follow is keep single points of failures far from each other - that way you only need to manage those components. A good use case for this would be to use an EC2 instance for your app, another for your workers and another for your rabbit. That way you can apply smaller/bigger instances for just those components if you need to. You can even create AMIs and create autoscaling groups - if it is your use case.
Here are some articles for reference
Seperation of concern
Modern design architectures
Single points of failure
TLDR; If you can run on one EC2 you should but make it easy to scale today.
Both Joshnidhin and Giannis covered the RAM, IO and CPU aspects.
I have run production apps in single instances with containerization and slept with peace of mind that if tomorrow suddenly lots of people want what I have built, I can scale pretty quickly by deploying those containers on different instances instead of one single instance.
Docker allows you to put a limit on CPU consumption and memory usage for each container hence you can also be sure that they will not step into each other.
If we take EC2 instance out of this question it becomes:
Are there any drawbacks in running RabbitMQ Node on the same server as my productions app?
I would say it depends on various things like, kind of workloads and its composition, complexity of the workload, do you expect growth in usage etc.
If your workload is well behaved and the server is big enough for both (app + task q) then why not as there will be only one server to manage. Make sure to protect these 2 process from each other by limiting their system resource usage.
If your traffic is not well behaved then you might want more the one server. In this case having dedicated servers is better (separation of concerns) as you will have to manage more than one server.
Now back to EC2, all the above still apply. EC2 makes horizontal scaling of applications easier so if you have them on separate instance then you can scale them individually and cost effectively. If not when you scale there will be wastage of resources.