I was working on this project that aim to make VM assignment more efficient.
By VM assigment, what I mean is as soon as a request for new virtual machine creation comes to the Openstack platform how the platform handles the request. In the openstack framework, nova-scheduler does this part. I was looking to add more features/filters into the nova scheduler.
I wanted to implement some special kinds of filters, in the Nova scheduler. That would have some special rules or which would maintain averaged load across the whole system and saving energy. Generally, A system with medium load consumes less energy than a system running at maximum load. I was thinking of filters that would allocate Virtual machines close, ie on a same rack. When a request for making a cluster of Vm is recieved. I would like what you think of feasiblity of any such filters. And How effective they can be ?
Any Help would be highly appriciated.
By default, openstack assign vms to bare metal whose memory is larger.
scheduler_driver=nova.scheduler.multi.MultiScheduler
scheduler_driver_task_period=60
compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
http://docs.openstack.org/trunk/config-reference/content/section_compute-scheduler.html
Related
I have developed Django API which accepts images from livefeed camera using in the form of base64 as request. Then, In API this image is converted into numpy arrays to pass to machine learning model i.e object detection using tensorflow object API. Response is simple text of detected objects.
I need GPU based cloud instance where i can deploy this application for fast processing to achieve real time results. I have searched a lot but no such resource found. I believe google cloud console (instances) can be connected to live API but I am not sure how exactly.
Thanks
I assume that you're using GPU locally or wherever your Django application is hosted.
First thing is to make sure that you are using tensorflow-gpu and all the necessary setup for Cuda is done.
You can start your GPU instance easily on Google Cloud Platform (GCP). There are multiple ways to do this.
Quick option
Search for notebooks and start a new instance with the required GPU and
RAM.
Instead of the notebook instance, you can set up the instance separately if you need some specific OS and more flexibility on choosing the machine.
To access the instance with ssh simply add your ssh public key
to Metadata which can be seen when you open the instance details.
Setup Django as you would do on the server. To test it simply just debug run it on host 0 or 0.0.0.0 and preferred port.
You can access the APIs with the external IP of the machine which can be found out in the instance details page.
Some suggestions
While the first option is quick and dirty, it's not recommended to use that in production.
It is better to use some deployment services such as tensorflow-serving along with Kubeflow.
If you think that you're handling the inference properly itself, then make sure that you load balance the server properly. Use NGINX or any other good server along with gunicorn/uwsgi.
You can use redis for queue management. When someone calls the API, it is not necessary that GPU is available for the inference. It is fine not to use this when you have very less number of hits on the API per second. But when we think of scaling up, think of 50 requests per second which a single GPU can't handle at a time, we can use a queue system.
All the requests should directly go to redis first and the GPU takes the jobs required to be done from the queue. If required, you can always scale the GPU.
Google Cloud actually offers Cloud GPUs. If you are looking to perform higher level computations with your applications that require real-time capabilities I would suggest your look into the following link for more information.
https://cloud.google.com/gpu/
Compute Engine also provides GPUs that can be added to your virtual machine instances. Use GPUs to accelerate specific workloads on your instances such as Machine Learning and data processing.
https://cloud.google.com/compute/docs/gpus/
However, if your application requires a lot of resources you’ll need to increase your quota to ensure you have enough GPUs available in your project. Make sure to pick a zone where GPUs are available. If this requires much more computing power you would need to submit a request for an increase of your quota. https://cloud.google.com/compute/docs/gpus/add-gpus#create-new-gpu-instance
Since you would be using the Tensorflow API for your application on ML Engine I would advise you to take a look at this link below. It provides instructions for creating a Deep Learning VM instance with TensorFlow and other tools pre-installed.
https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance
This question has a conceptual and practical parts.
Conceptually I'd like to know if using the autoscaling functionality is equivalent to simply increasing the compute power by a factor of the number of added instances?
Practically ... how does this work? I have one running instance, its database sitting on an LVM composed of multiple EBS volumes, similarly with all website data. Judging from the load on the instance I either need to upgrade to a more powerful instance or introduce this autoscaling. Is it a copy of the running server? If so, how is the database (etc) kept consistent?
I've read through the AWS documentation, and still haven't got the picture yet - I could set one autoscaling group up which would probably clear my doubts, but I am very leery to do this with a production server.
Any nudges in the right direction would be welcome.
Normally if you have a solution that also uses a database, and several machines in the solution, the database is typically not on any of the machines but is instead hosted seperately with each worker machine pointing to the same database - if you are on AWS platform already, then DynamoDB or RDS are both good solutions for this.
In theory, for some applications, upgrading the size of the single machine will give you the same power as adding several smaller machines, but increasing the size of the single machine, while usually these easiest thing to do at first, should not be considered autoscaling and has its own drawbacks. Here are some things to consider:
Using multiple machines instead of one big one gives you some fault tolerance. One or more machines can go down and if your solution is properly designed new machines will spin up to replace them.
Increasing the size of a single machine solution means you are probably paying too much. If you size that single machine big enough to handle peak workloads, that means at other times (maybe most of the time), you are paying for a bigger machine than you need. If you setup your autoscaling solution properly more machines come on line in response to increasing demand, and then they terminate when that demand decreases - you only pay for the power you need when you need it.
When your solution is designed in this manner, you need to think of all of the worker machines as ephermal - likely to disappear at any time, so you need to build your solution differently. Besides using a hosted database (like on DynamoDB or AWS RDS), you also should not store any data on the machines in your auto-scaling group that doesn't also live somewhere else. For example, if part of your app allows users to upload images, you don't store them on the instances, you store them in S3. Same would apply to any other new data that comes in.
You need to be able to figuratively 'pull the plug' at any instant on any of the machines in your ASG without losing data.
Ultimately a properly setup auto-scaling solution will likely serve you better, but without doubt it is simpler to just 'buy a bigger machine' and the extra money you spend on running that bigger machine may be more than offset by the time and effort you don't have to spend re-architecting your solution to properly run in an autoscaling environment. The unique requirements of your solution will ultimately decide which approach is better.
I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.
Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.
My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team
Google Compute Engine lets you get a group of instances that are semantically local in the sense that only they can talk to each other and all external access has to go through a firewall etc. If I want to run Map-Reduce or other kinds of cluster jobs that are going to induce high network traffic, then I also want machines that are physically local (say, on the same rack). Looking at the APIs and initial documentation, I don't see any way to request that; does anyone know otherwise?
There is no support in GCE right now for specifying rack locality. However, we built the system to work well in the face of large numbers of instances talking to each other in a fully connected way, as long as they are in the same zone.
This is one of the things that allowed MapR to approach the record for a hadoop terasort. You can see that in action in the video for the Criag Mcluckie's talk from IO:
https://developers.google.com/events/io/sessions/gooio2012/302/
The best way to see is to test out your application and see how it works.
In a research project involving virtualization and power management I am testing various resource allocation scenarios and custom power management algorithms. I am interested in isolating a virtual machine to use only a certain CPU core.
I was thinking about using Windows 2008R2 and Hyper-V, but Hyper-V does not allow setting CPU affinity for a virtual machine, is there any way I can make sure that a virtual machine running a CPU intensive task will use only one core of the CPU (the VM is configured to use a single CPU) and have the rest of the cores available for other task?
VMware ESX Server is an interesting choice since it provides the settings I need (including hot memory add), however it seems like a closed system. Does the OS of ESX Server, based on Linux from what I understand, allow for installing custom application through which to control aspects related to power management of the physical server's components (e.g. perform CPU frequency scaling). Does it provide any APIs? I am aware the product already has power management features, but I am looking for means to achieve custom implementations.
Besides these two solutions, can you recommend other hypervisors which provide facilities such as setting CPU affinity, CPU limits and reservations, hot memory add and which allow for custom applications running on the host server (also provide APIs to program such applications) - maybe Citrix XenSource, KVM (I am not familiar with these solutions)?
I don't think VMware would support modifications to the server, but you can get a command line on the ESX server as essentially you're right, it's linux underneath (RedHat mod I believe).
Xen/KVM are open source so you can hack away. You may be advised to go down the KVM route if you have budgetary constraints as the community will support you. The inclusion of Citrix may prove troublesome in an enterprise setting.
is there any way I can make sure that a virtual machine running a CPU intensive task will use only one core of the CPU
Openstack (KVM as hypervisor) provides a feature of CPU pinning through which you can bind a vCPU to a physical CPU core. Let me know if you need more information on the subject.
Here is a link explaining the feature. This link also confirms that Hyper-V doesn't support CPU pinning.