Which AWS RDS instance to upgrade given following usage pattern

Which AWS RDS instance to upgrade given following usage pattern - amazon-web-services

I have been using t2.medium RDS instance, which is experiencing regular exhaustion of CPU-Credit-balance. Following is the graph of CPU-Credit-balance for an interval of 6 weeks
Since next available instance i.e. t2.large offers same vCPU and ECU, does it provide any improvement in terms of processing capabilities (like increased CPU credits). What is the best course of action I could take in this scenario in terms of RDS instance and other measures(apart from optimizing queries which I will do but I need quick solution so that users don't suffer slow speed)

It's not just about CPU Credits which should be considered but also CPU Utilisation, Memory Utilisation, Queue Depth etc... Looks like you are using CPU Intensive queries.. Credit going down to zero looks like a serious concern should should be resolved.
With t2 instances; you do NOT get 100% of them.
As recommended by Krishna; I agree that you should try moving to m4.large instead of t4.large.

Related

Why not get 2 nano instances instead 1 micro instance on AWS?

I'm choosing instances do run microservices on an AWS EKS cluster.
When reading about it on this article an taking a look on the aws docs it seems that choosing many small instances instead of on larger instance results on a better deal.
There seems to be no downside on taking, for instance, 2 t3.nano (2 vCPU / 0.5GiB each) vs 1 t3.micro (2 vCPU / 1GiB each). The price and the memory are the same but the CPU provided has a huge difference the more instances you get.
I assume there are some processes running on each machine by default, but I found no places metioning its impact on the machine resources or usage. Is it negligible? Is there any advantage on taking one big instance instead?

The issue is whether or not your computing task can be completed on the smaller instances and also there is an overhead involved in instance-to-instance communication that isn't present in intra-instance communication.
So, it is all about fitting your solution onto the instances and your requirements.

There is no right answer to this question. The answer depends on your specific workload, and you have to try out both approaches to find out what works best for your case. There are advantages and disadvantages to both approaches.
For example, if the OS takes 200 MB for each instance, you will be left with only 600 MB both nano instances combined vs the 800 MB on the single micro instance.
When the cluster scales out, initializing 2 nano instances might roughly take twice as much time as initializing one micro instance to provide the same additional capacity to handle the extra load.
Also, as noted by Cargo23, inter-instance communication might increase the latency of your application.

How to adjust and measure network performance on AWS

Lately, I have been struggling to understand what is my network speed (downlink) between nodes on AWS (in a multi-homed cluster, computers in different regions).
I have a lot of fluctuations when I measure it with a script which I have written (based on this link and SCP) or with Iperf.
I believe it is based on network use which changes rapidly (mostly between regions), but I still don't understand AWS documentation about what is the performance I am paying for, a minimum and a maximum downlink rate for example (aws instances).
At first, I have tried the T2 type, and as I saw it had burst CPU performance, I thought that maybe the NIC performance is also bursty so I have moved to M4 type, but I have got the same problems with M4.
Is there any way to know my NIC downlink rate based on the type and flavor?
*I have asked a similar question on the AWS forum, but I haven't got a response (https://forums.aws.amazon.com/thread.jspa?threadID=296389).

There is no way to get a better indication that your measuring. AWS does not publish anything indicating this performance, and unless we are talking the larger instance where network performance is actually specifically given. I.e. m5.12xlarge having 10 gbps. Most likely network performance does have a burst component for smaller instance types.
There are pages with other peoples benchmarks, but you won't find any official answer for any of this.

how to use two aws ec2 instances(1 gpu and 1 cpu instance) with one storage to(run code, store/share files) & reduce cost

My team is using a gpu instance to run machine learning tensorflow based, yolo,computer vision applications and use it for training machine learning models also.. It costs 7$ an hour and has 8 gpu's. Was trying to reduce costs on it. We need 8 gpu's for faster training and sometimes many people can use different gpu's at the same time.
For our use case we are not using sometimes the gpu's(8 gpus) at all for atleast 1-2 weeks of a month. But a use of the gpu may arrive during that time but maynot also. So i wanted to know is there a way to edit the code and do all cpu intensive operations when gpu not needed through a low cost cpu instance. And turn on the gpu instance only when needed use it and then stop it when work done.
I thought of using efs for putting code on the shared file system and then running from there but i read an article( https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs ) where its written that i should never run code from network based drives because the speed can become really slow. So i dont know if its good to run machine learning application from efs file system. I was thinking of making virtual environments on folders in efs but i dont think that is a good idea.
Could anyone suggest good ways of achieving this and reduce costs. And if you are suggesting to use an instance with lower number of gpu's that i have considered but we sometimes need 8 gpu's for faster training but we dont use the gpus at all for 1-2 weeks but the costs are still incurred.
Please suggest a way on how to achieve a low cost for this use case without using spot or reserved instances.
Thanks in advance

A few thoughts:
GPU instances now allow hibernation, so when launching your GPU select the new Stop Instance behavior 'hibernate' which will let you turn it off for 2 weeks but spin it up quickly if necessary
If you only have one instance, look into using EBS for data storage with a high volume of provisioned iops to move data on/off your instance quickly
Alternately, move your model to Sagemaker to ensure you are only charged for GPU use when you are actively training your model
If you are applying your model (inferencing) move that workload to a cheap instance. A trained yolo model can run inferencing on very small CPU instances, no need for a GPU for that part of the workload at all.

To reduce inference costs, you can use Elastic Inference which supports pay-per-use functionality:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html

Can I improve performance of my GCE small instance?

I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.

Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.

My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team

EC2 server, lots of micro instances or fewer larger instances?

I was wondering which would be better, to host a site on EC2 with many micro instances, or fewer larger instances such as m1.large. All will sit behind one or a few larger instances as load balancers. I will say what my understanding is, and anybody who knows better can add or correct me if I'm wrong
Main reason for choosing micro instances is cost. A single micro instance on average will give around 0.35ECU for $0.02/hour, while one small instance will give 1ECU for $0.085. If you do the math of $/ECU/hour, a micro instance works out to be $0.057/ECU/hour, whereas for a small instance it's $0.085/ECU/hour. So for the same average computing power, choosing 100 micro instances would be cheaper than 35 small instances.
Main problem with micro instances is more fluctuating performance, but I'm not sure if this will be less of a problem when you have many instances.
So does anybody have experience benching such setups and see the benefits and drawbacks? I'm trying to choose which way to go.
PS: an article on the subject, http://huanliu.wordpress.com/2010/09/10/amazon-ec2-micro-instances-deeper-dive/

Beware of micro-instances, they may bite you. We have out test environment all on micro-instances. Since they are just functional test environment, it works smoothly. However, we happened to have update some application (well, Jetty 7.5.3) that has known bug of spinning higher CPU usage. This rendered those instances useless as Amazon throttles the available CPU to 2%.
Also, micro instances are EBS backed. EBS is not advisable (over instance-store) for high IO operations like the ones require for Cassandra or the likes.
If you want to save money and your software is architected to handle interruptions, you may opt for spot instances. They usually cost less than on-demand ones.
If all these are not a issue to you, I would say, micro-instances is the way to go! :)
Basics questions about micro instances performance
CPU pattern for micro
Stolen CPU on micro

I would say: depends on what kind of architecture your app will have and how reliable it will need to be:
AWS Load Balancers does not provide instant (maybe real-time is a better word?)
auto-scale which is different of fail-over concept. It works with
health checks from time to time and have its small delay because it
is done via http requests (more overhead if you choose https).
You will have more points of failure if you choose more instances depending on architecture. To avoid it, your app will need to be async between instances.
You must benchmark and test more your application if you choose more
instances, to guarantee those bursts won't affect your app too much.
That's my point of view and it would be a very pleasant discussion between experienced people.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js