Difference between Compute optimized instances and Accelerated computing instances - amazon-web-services

I am new to AWS and came across these two instances but I don't understand the need for the difference between them they seem the same can Anyone explain this
Accelerated computing instances
Accelerated computing instances use hardware accelerators, or coprocessors, to perform some functions more efficiently than is possible in software running on CPUs. Examples of these functions include floating-point number calculations, graphics processing, and data pattern matching.
In computing, a hardware accelerator is a component that can expedite data processing. Accelerated computing instances are ideal for workloads such as graphics applications, game streaming, and application streaming.
Compute-optimized instances
Compute-optimized instances are ideal for compute-bound applications that benefit from high-performance processors. Like general-purpose instances, you can use compute-optimized instances for workloads such as web, application, and gaming servers.
However, the difference is computed optimized applications are ideal for high-performance web servers, compute-intensive applications servers, and dedicated gaming servers. You can also use compute-optimized instances for batch processing workloads that require processing many transactions in a single group.

The main difference between both of them is the hardware. Accelerated computing instances have GPUs which are accelerators whereas the compute optimized have good high performance processors. Though both of them provide high compute power, the way in which they are providing is different.
https://aws.amazon.com/ec2/instance-types/

Compute Optimized instances are the instance designed for the applications that inturn gives the benefit as high compute power. These applications includes compute-intensive applications e.g. high-performance web servers, high-performance computing (HPC), distributed analytics, scientific modelling and ML inference
To know more about Compute Optimized instances you can refer this
link.
If you require high processing capability, you'll benefit from using accelerated computing instances, which provide access to hardware-based compute accelerators such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), or AWS Inferentia.
To know more about Accelerated Computing you can refer this link.

Related

Sagemaker Endpoint throttling exception

I have created an endpoint using Sagemaker, and designed my system so that it is called about 100 times simultaneously. This seemed to cause 'Model error' and take too much time. Do I need to create an endpoint for each event, and make one call per endpoint, instead?
you can go in cloudwatch logs to diagnose your model failure.
Real-time inference traffic scaling can be addressed via working on 3 independent dimensions:
hardware: choosing larger machines or more
machines. For example you can load test your model endpoint with bigger and bigger machines and see when hardware size gives you acceptable latency. The Autoscaling feature of SageMaker helps you address this automatically. If deploying a deep neural net, you can also consider using appropriate accelerators, eg GPU (EC2 P3, EC2 G4) or Amazon Elastic Inference Accelerator to make each prediction much faster.
software: you have 2 levers to tune here:
choosing a serving stack that is lean and fast. Different servers will handle load at different levels of performance. One common trick is to batch the load - for example, instead of hitting 100 times your server can you hit it only once with a batch of 100 records? If clients cannot batch their requests, can you use micro-asynchrony so that you do the batching yourself after they issued requests? You can usually configure such micro-batching in advanced deep learning servers such as TF Serving or MXNet Model Server (both can be used in SageMaker), but otherwise you can also do it yourself by having a queue (SQS) in front of your server.
model compilation - optimizing the model graph and its runtime. This is a very smart concept, that leverages the fact that when you know where you're going to deploy (eg NVIDIA, Intel, ARM, etc), you have an insider edge and you can refine your model artifact and create a bespoke runtime application that are tailor-made for this specific target platform. This can reduce memory consumption and latency by double-digit percentage, and is an active area of ML research. In the SageMaker ecosystem, such a compilation task can be performed with SageMaker Neo, but the open source ecosystem is developing fast, with notably treelite (paper, doc) for decision tree compilation and TVM (paper, doc) for arbitrary neural net compilation. Both are dependencies of Neo by the way.
science: some models are slower or heavier than others. If speed and concurrency are your priorities over accuracy, and if you already exploited all possible tricks at level (1) and (2) above, consider using fast-throughput models, eg linear models & logistic regression for structured data, MobileNet or SqueezeNet instead of large Resnets for classification (nice benchmark here), Yolov3 instead of FasterRCNN for detection (nice benchmark here), etc. But be aware that unlike levels (1) and (2), changing model science will alter accuracy.
As mentioned above, those 3 areas of improvements really are about real-time inference; if you can afford to pre-compute all possible model inputs, then the ultimate low-latency high-throughput solution is to pre-compute offline a variety of input-predictions pairs of interest and serve them on demand from a fast database or local read-only store.

how to use two aws ec2 instances(1 gpu and 1 cpu instance) with one storage to(run code, store/share files) & reduce cost

My team is using a gpu instance to run machine learning tensorflow based, yolo,computer vision applications and use it for training machine learning models also.. It costs 7$ an hour and has 8 gpu's. Was trying to reduce costs on it. We need 8 gpu's for faster training and sometimes many people can use different gpu's at the same time.
For our use case we are not using sometimes the gpu's(8 gpus) at all for atleast 1-2 weeks of a month. But a use of the gpu may arrive during that time but maynot also. So i wanted to know is there a way to edit the code and do all cpu intensive operations when gpu not needed through a low cost cpu instance. And turn on the gpu instance only when needed use it and then stop it when work done.
I thought of using efs for putting code on the shared file system and then running from there but i read an article( https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs ) where its written that i should never run code from network based drives because the speed can become really slow. So i dont know if its good to run machine learning application from efs file system. I was thinking of making virtual environments on folders in efs but i dont think that is a good idea.
Could anyone suggest good ways of achieving this and reduce costs. And if you are suggesting to use an instance with lower number of gpu's that i have considered but we sometimes need 8 gpu's for faster training but we dont use the gpus at all for 1-2 weeks but the costs are still incurred.
Please suggest a way on how to achieve a low cost for this use case without using spot or reserved instances.
Thanks in advance
A few thoughts:
GPU instances now allow hibernation, so when launching your GPU select the new Stop Instance behavior 'hibernate' which will let you turn it off for 2 weeks but spin it up quickly if necessary
If you only have one instance, look into using EBS for data storage with a high volume of provisioned iops to move data on/off your instance quickly
Alternately, move your model to Sagemaker to ensure you are only charged for GPU use when you are actively training your model
If you are applying your model (inferencing) move that workload to a cheap instance. A trained yolo model can run inferencing on very small CPU instances, no need for a GPU for that part of the workload at all.
To reduce inference costs, you can use Elastic Inference which supports pay-per-use functionality:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html

How to do or perform

Ho would you do the below
Please share your thoughts.
There are many ways to perform stress test on AWS architecture some of them are Jmeter, Blazemeter etc. Regarding restriction you have to let AWS support know before hand regarding the stress test or penetration test you are going to perform on the AWS infrastructure you have created. Check this link for more details.
https://aws.amazon.com/security/penetration-testing/
Since you pay for bytes into and out of the Amazon infrastructure, to maintain low costs keep your load generators in the same data center as your application under test. There are some drawbacks to this, but the primary one is that your network will lack the complexity of impairment that real users will experience. If you are using a tool which includes modeling of network impairment with the virtual users then this drawback is reduced.
No matter what tool you use, if you have the load generators running on virtual machines in AWS, you will face the issue of clock float on the virtual machine clock. Periodically this virtual clock will need to be resync'd to the system clock on the hypervisor host. This will result in clock jump. This will happen while you have a timing record open - it is unavoidable. The net impact of this is that you will have higher average, percentile values, standard deviation and maximums than if you were running on physical hardware.

Can I improve performance of my GCE small instance?

I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.
Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.
My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team

How to measure the power consumption in a virtual machine on a cloud environment?

I'm running a few Blockchain related containers in a cloud environment (Google Compute Engine). I want to measure the power/energy consumption of the containers or the instance that I'm running.
There are tools like powerapi which is possible to do this in real infrastructure where it has access to real CPU counters. This should be possible by doing an estimation based on the CPU, Memory, and Network usage. There's one such model proposed in the literature.
Is it theoretically possible to do this? If so is there already existing tools for this task.
A generalized answer is "No, it is impossible". A program running in a virtual machine deals with virtual hardware. The goal of virtualization is to abstract running programs from physical hardware, while access to physical equipment is required to measure energy consumption. For instance, without processor affinity enabled, it's unlikely that PowerAPI will be able to collect useful statistics due to virtual CPU migrations. Time-slicing that allows to run multiple vCPUs on one physical CPU is another factor to keep in mind. Needless to say about energy consumed by RAM, I/O controllers, storage devices, etc.
A substantive answer is "No, it is impossible". Authors of the manuscript use PowerAPI libraries to collect CPU statistics, and a monitoring agent to count bytes transmitted through network. HWPC-sensor the PowerAPI relies on has distinct requirement:
"monitored node(s) run a Linux distribution that is not hosted in a
virtual environment"
Also, authors emphasize that they couldn't measure absolute values of power consumption but rather used percentage to compare certain Java classes and methods in order to suppose the origin of energy leaks in CPU-intensive workloads.
As for the network I/O, the number of bytes used to estimate power consumption in their model differs significantly between the network interface exposed into the virtual guest and the host hardware port on the network with the SDN stack in between.
Measuring in cloud instances is more complicated yes. If you do have access to the underlying hardware, then it would be possible. There is a pretty recent project that allows to get power consumption metrics for a qemu/kvm hypervisor and share the metrics releveant to each virtual machine through another agent in the VM: github.com/hubblo-org/scaphandre/
They are also working (very early) on a "degraded mode" to estimate* the power consumption of the instance/vm if such communication between the hypervisor and the vm is not possible, like when working on a public cloud instance.
*: Based on the resources (cpu/ram/gpu...) consumption in the VM and characteristics of the instance and the underlying hardware power consumption "profile".
Can we conclude that , it is impossible to measure power/energy consumption of process running on virtual machines?