I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.
Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.
My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team
Related
Lately, I have been struggling to understand what is my network speed (downlink) between nodes on AWS (in a multi-homed cluster, computers in different regions).
I have a lot of fluctuations when I measure it with a script which I have written (based on this link and SCP) or with Iperf.
I believe it is based on network use which changes rapidly (mostly between regions), but I still don't understand AWS documentation about what is the performance I am paying for, a minimum and a maximum downlink rate for example (aws instances).
At first, I have tried the T2 type, and as I saw it had burst CPU performance, I thought that maybe the NIC performance is also bursty so I have moved to M4 type, but I have got the same problems with M4.
Is there any way to know my NIC downlink rate based on the type and flavor?
*I have asked a similar question on the AWS forum, but I haven't got a response (https://forums.aws.amazon.com/thread.jspa?threadID=296389).
There is no way to get a better indication that your measuring. AWS does not publish anything indicating this performance, and unless we are talking the larger instance where network performance is actually specifically given. I.e. m5.12xlarge having 10 gbps. Most likely network performance does have a burst component for smaller instance types.
There are pages with other peoples benchmarks, but you won't find any official answer for any of this.
My team is using a gpu instance to run machine learning tensorflow based, yolo,computer vision applications and use it for training machine learning models also.. It costs 7$ an hour and has 8 gpu's. Was trying to reduce costs on it. We need 8 gpu's for faster training and sometimes many people can use different gpu's at the same time.
For our use case we are not using sometimes the gpu's(8 gpus) at all for atleast 1-2 weeks of a month. But a use of the gpu may arrive during that time but maynot also. So i wanted to know is there a way to edit the code and do all cpu intensive operations when gpu not needed through a low cost cpu instance. And turn on the gpu instance only when needed use it and then stop it when work done.
I thought of using efs for putting code on the shared file system and then running from there but i read an article( https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs ) where its written that i should never run code from network based drives because the speed can become really slow. So i dont know if its good to run machine learning application from efs file system. I was thinking of making virtual environments on folders in efs but i dont think that is a good idea.
Could anyone suggest good ways of achieving this and reduce costs. And if you are suggesting to use an instance with lower number of gpu's that i have considered but we sometimes need 8 gpu's for faster training but we dont use the gpus at all for 1-2 weeks but the costs are still incurred.
Please suggest a way on how to achieve a low cost for this use case without using spot or reserved instances.
Thanks in advance
A few thoughts:
GPU instances now allow hibernation, so when launching your GPU select the new Stop Instance behavior 'hibernate' which will let you turn it off for 2 weeks but spin it up quickly if necessary
If you only have one instance, look into using EBS for data storage with a high volume of provisioned iops to move data on/off your instance quickly
Alternately, move your model to Sagemaker to ensure you are only charged for GPU use when you are actively training your model
If you are applying your model (inferencing) move that workload to a cheap instance. A trained yolo model can run inferencing on very small CPU instances, no need for a GPU for that part of the workload at all.
To reduce inference costs, you can use Elastic Inference which supports pay-per-use functionality:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html
I created an Amazon EC2 Auto Scaling group, where it should have at least 1 server all the time.
Add up 2 servers when CPU utilization passes beyond 80%
Terminate 2 servers when CPU utilization comes down less then 30%
Challenge here is, How should I increase/decrease CPU utilization? I cannot connect to any instance or use CLI since I am in Office system / restricted AWS access.
Is there a way to test this despite of these restrictions?
There is a way to stress test an instance or container (assuming it is Linux based) using Stress, a package that is designed to crank up the CPU for a specified amount of time and then bring the CPU percentages down after the specified amount of time. It has other parameters to customize the testing.
My personal favorite tool for testing system response and DR is to use Netflix's ChaosMonkey. It is an open source project, backed by Netflix that is designed to test fault tolerance. Using it in production comes down to personal preference, but it is a tool for testing systems.
If you want to test the "real" situation, then you will need a way to generate load on the system.
This could be artificial load (eg triggering a program that does calculations, just to spin the CPU) or a real-world simulation of actual activities that you system will perform.
There is no need to test whether Amazon EC2's Auto Scaling actually works — there would be issues shown on the AWS Status Page if that were the case — so I presume you just wish to test your own configuration. In this case, you should really be testing a real world scenario, such as simulating a quantity of simultaneous users doing typical activities that users would perform.
If you do any other form of testing (such as fake increasing of CPU load), you're not really testing the real situation in which you want Auto Scaling to perform, so the results of your test won't actually be useful.
For example, it might be that your application runs into memory issues or single-threading issues way before it hits any CPU limits. That would be something you'd really like to know before throwing real users at your system.
This question has a conceptual and practical parts.
Conceptually I'd like to know if using the autoscaling functionality is equivalent to simply increasing the compute power by a factor of the number of added instances?
Practically ... how does this work? I have one running instance, its database sitting on an LVM composed of multiple EBS volumes, similarly with all website data. Judging from the load on the instance I either need to upgrade to a more powerful instance or introduce this autoscaling. Is it a copy of the running server? If so, how is the database (etc) kept consistent?
I've read through the AWS documentation, and still haven't got the picture yet - I could set one autoscaling group up which would probably clear my doubts, but I am very leery to do this with a production server.
Any nudges in the right direction would be welcome.
Normally if you have a solution that also uses a database, and several machines in the solution, the database is typically not on any of the machines but is instead hosted seperately with each worker machine pointing to the same database - if you are on AWS platform already, then DynamoDB or RDS are both good solutions for this.
In theory, for some applications, upgrading the size of the single machine will give you the same power as adding several smaller machines, but increasing the size of the single machine, while usually these easiest thing to do at first, should not be considered autoscaling and has its own drawbacks. Here are some things to consider:
Using multiple machines instead of one big one gives you some fault tolerance. One or more machines can go down and if your solution is properly designed new machines will spin up to replace them.
Increasing the size of a single machine solution means you are probably paying too much. If you size that single machine big enough to handle peak workloads, that means at other times (maybe most of the time), you are paying for a bigger machine than you need. If you setup your autoscaling solution properly more machines come on line in response to increasing demand, and then they terminate when that demand decreases - you only pay for the power you need when you need it.
When your solution is designed in this manner, you need to think of all of the worker machines as ephermal - likely to disappear at any time, so you need to build your solution differently. Besides using a hosted database (like on DynamoDB or AWS RDS), you also should not store any data on the machines in your auto-scaling group that doesn't also live somewhere else. For example, if part of your app allows users to upload images, you don't store them on the instances, you store them in S3. Same would apply to any other new data that comes in.
You need to be able to figuratively 'pull the plug' at any instant on any of the machines in your ASG without losing data.
Ultimately a properly setup auto-scaling solution will likely serve you better, but without doubt it is simpler to just 'buy a bigger machine' and the extra money you spend on running that bigger machine may be more than offset by the time and effort you don't have to spend re-architecting your solution to properly run in an autoscaling environment. The unique requirements of your solution will ultimately decide which approach is better.
I was wondering which would be better, to host a site on EC2 with many micro instances, or fewer larger instances such as m1.large. All will sit behind one or a few larger instances as load balancers. I will say what my understanding is, and anybody who knows better can add or correct me if I'm wrong
Main reason for choosing micro instances is cost. A single micro instance on average will give around 0.35ECU for $0.02/hour, while one small instance will give 1ECU for $0.085. If you do the math of $/ECU/hour, a micro instance works out to be $0.057/ECU/hour, whereas for a small instance it's $0.085/ECU/hour. So for the same average computing power, choosing 100 micro instances would be cheaper than 35 small instances.
Main problem with micro instances is more fluctuating performance, but I'm not sure if this will be less of a problem when you have many instances.
So does anybody have experience benching such setups and see the benefits and drawbacks? I'm trying to choose which way to go.
PS: an article on the subject, http://huanliu.wordpress.com/2010/09/10/amazon-ec2-micro-instances-deeper-dive/
Beware of micro-instances, they may bite you. We have out test environment all on micro-instances. Since they are just functional test environment, it works smoothly. However, we happened to have update some application (well, Jetty 7.5.3) that has known bug of spinning higher CPU usage. This rendered those instances useless as Amazon throttles the available CPU to 2%.
Also, micro instances are EBS backed. EBS is not advisable (over instance-store) for high IO operations like the ones require for Cassandra or the likes.
If you want to save money and your software is architected to handle interruptions, you may opt for spot instances. They usually cost less than on-demand ones.
If all these are not a issue to you, I would say, micro-instances is the way to go! :)
Basics questions about micro instances performance
CPU pattern for micro
Stolen CPU on micro
I would say: depends on what kind of architecture your app will have and how reliable it will need to be:
AWS Load Balancers does not provide instant (maybe real-time is a better word?)
auto-scale which is different of fail-over concept. It works with
health checks from time to time and have its small delay because it
is done via http requests (more overhead if you choose https).
You will have more points of failure if you choose more instances depending on architecture. To avoid it, your app will need to be async between instances.
You must benchmark and test more your application if you choose more
instances, to guarantee those bursts won't affect your app too much.
That's my point of view and it would be a very pleasant discussion between experienced people.