Object Detection Django Rest API Deployment on Google Cloud Platform or Google ML Engine - django

I have developed Django API which accepts images from livefeed camera using in the form of base64 as request. Then, In API this image is converted into numpy arrays to pass to machine learning model i.e object detection using tensorflow object API. Response is simple text of detected objects.
I need GPU based cloud instance where i can deploy this application for fast processing to achieve real time results. I have searched a lot but no such resource found. I believe google cloud console (instances) can be connected to live API but I am not sure how exactly.
Thanks

I assume that you're using GPU locally or wherever your Django application is hosted.
First thing is to make sure that you are using tensorflow-gpu and all the necessary setup for Cuda is done.
You can start your GPU instance easily on Google Cloud Platform (GCP). There are multiple ways to do this.
Quick option
Search for notebooks and start a new instance with the required GPU and
RAM.
Instead of the notebook instance, you can set up the instance separately if you need some specific OS and more flexibility on choosing the machine.
To access the instance with ssh simply add your ssh public key
to Metadata which can be seen when you open the instance details.
Setup Django as you would do on the server. To test it simply just debug run it on host 0 or 0.0.0.0 and preferred port.
You can access the APIs with the external IP of the machine which can be found out in the instance details page.
Some suggestions
While the first option is quick and dirty, it's not recommended to use that in production.
It is better to use some deployment services such as tensorflow-serving along with Kubeflow.
If you think that you're handling the inference properly itself, then make sure that you load balance the server properly. Use NGINX or any other good server along with gunicorn/uwsgi.
You can use redis for queue management. When someone calls the API, it is not necessary that GPU is available for the inference. It is fine not to use this when you have very less number of hits on the API per second. But when we think of scaling up, think of 50 requests per second which a single GPU can't handle at a time, we can use a queue system.
All the requests should directly go to redis first and the GPU takes the jobs required to be done from the queue. If required, you can always scale the GPU.

Google Cloud actually offers Cloud GPUs. If you are looking to perform higher level computations with your applications that require real-time capabilities I would suggest your look into the following link for more information.
https://cloud.google.com/gpu/
Compute Engine also provides GPUs that can be added to your virtual machine instances. Use GPUs to accelerate specific workloads on your instances such as Machine Learning and data processing.
https://cloud.google.com/compute/docs/gpus/
However, if your application requires a lot of resources you’ll need to increase your quota to ensure you have enough GPUs available in your project. Make sure to pick a zone where GPUs are available. If this requires much more computing power you would need to submit a request for an increase of your quota. https://cloud.google.com/compute/docs/gpus/add-gpus#create-new-gpu-instance
Since you would be using the Tensorflow API for your application on ML Engine I would advise you to take a look at this link below. It provides instructions for creating a Deep Learning VM instance with TensorFlow and other tools pre-installed.
https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance

Related

How does Google Cloud Run spin up instantly

So, I really like the idea of server-less. I came across Google Cloud Functions and Google Cloud Run.
So google cloud functions are individual functions, which is a broad perspective, I assume google must be securely running on a huge nodejs server. And it contains all the functions of all the google consumers and fulfils the request using unique URLs. Now, Google takes care of the cost of this one big server and charges users for every hit their function gets. So its pay to use. And makes sense.
But when it comes to Cloud Run. I fail to understand how does it work. Obviously the container must not always be running because then they will simply charge a monthly basis instead of a per-hit basis, just like a normal VM where docker image is deployed. But no, in reality, they charge on per hit basis, that means they spin up the container when a request arrives. So, I don't understand how does it spin it up so fast? The users have the flexibility of running any sort of environment, that means the docker container could contain literally anything. Maybe a full-fledged Linux OS. How does it load up the environment OS so quickly and fulfils the request? Well, maybe it maintains the state of the machine and shut it down when not in use, but even then, it will require a decent amount of time to restore the state.
So how does google really does it? How is it able to spin up a customer's container in literally no time?
The idea of fast spinning-up sandboxes containers (that run on their own kernel for security reasons) have been around for a pretty long time. For example, Intel Clear Linux Containers and Firecracker provide fast startup through various optimizations.
As you can imagine, implementing something like this would require optimizations at many layers (scheduling, traffic serving, autoscaling, image caching...).
Without giving away Google’s secrets, we can probably talk about image storage and caching: Just like how VMs use initramfs to pre-cache the state of the VM, instead of reading all the files from harddisk and following the boot sequence, we can do similar tricks with containers.
Google uses a similar solution for Cloud Run, called gVisor. It's a user-space virtualization technique (not an actual VMM or hypervisor). To run containers on a Linux-like environment, gVisor doesn't need to boot a Linux kernel from scratch (because gVisor reimplements the linux kernel in go!).
You’ll find many optimizations on other serverless platforms across most cloud providers (such as how to keep a container instance around, should you be predictively scheduling inactive containers before the load arrives). I recommend reading the Peeking Behind the Curtains of Serverless Platforms paper to get an idea about what are the problems in this space and what are cloud providers trying to optimize for speed and cost.
You have to decouple the containers to the VMs. The second link of Dustin is great because if you understand the principles of Kubernetes (and more if you have a look to Knative), it's easy to translate this to Cloud Run.
You have a pool of resources (Nodes in Kubernetes, the VM in fact with CPU and memory) and on these resources, you can run container: 1, 2, 1000 per VM, maybe, you don't know and you don't care. The power of the container, is the ability to be packaged with all the dependency that it needs. Yes, I talked about package because your container isn't an OS, it contains the dependencies for interacting with the host OS.
For preventing any problem between container from different project/customer, the container run into a sandbox (GVisor, first link of Dustin).
So, there is no VM to start and to stop, no VM to create when you deploy a Cloud Run services,... It's only a start of your container on existing resources. It's also for this reason that you need to have a stateless container, without disks attached to it.
Do you want 3 "secrets"?
It's exactly the same things with Cloud Functions! Your code is packaged into a container and deploy exactly as it's done with Cloud Run.
The underlying platform that manages Cloud Functions and Cloud Run is the same. That's why the behavior and the feature are very similar! Cloud Functions is longer to deploy because Google need to build the container for you. With Cloud Run the container is already built.
Your Compute Engine instance is also managed as a container on the Google infrastructure! More generally, all is container at Google!

Manually install every thing in GCP VM module

I am new to cloud and still learning GCP, I exhausted almost all my free credit for GCP within 2 months while learning different modules.
GCP is great and provides a lot of things to ease the development and maintenance process.
But I realized using different modules cost me a lot.
So I was wondering if I could have a big VM box, install MySQL, Docker, and Java and React required components by myself, I can achieve pretty much what I want without using extra modules.
Am I right?
Can I use the same VM to host multiple sites by changing ports of API, or do I need to have different boxes for that?
Your question is out of GCP domain but about IT architecture. You can create a big VM with all installed on it. But you have to manage it by yourselves and the scalability is hard.
You can also have 1 VM per website, but the management cost is higher (patching and updgrade)! However you can scale with a better granularity (per website).
The standard pattern today is to explode your monolith server into dedicated services. The database on a specific server, the docker and Java in another one, and the react in a static component (like Google Cloud Platform).
If you want to use VM, you can use GKE and you containerize your application. It's far more easier to maintain your VM with an automatic tool like K8S.
The ultimate step, is to use serverless and/or full managed solution. Cloud SQL for your database, GCS for your static content, and App Engine or Cloud Run for your backend. Like this, you pay as you use and if you website is not very used, you won't be charged on it (except for the database).

Access google compute engine from google app engine [duplicate]

I have an instance running on the Compute Engine which uses Torch to predict objects in images. I wanted to make a simple web interface using which a user can upload an image, the image is sent to the server(compute engine), the objects are predicted and the list is returned back to the user.
In my compute engine (Ubuntu 14.04) this line of code is used to predict objects in images. (All the other setup has been already done in the compute engine.)
th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10
I want to call this line from the web app and pass the image to the image folder and get back the list of objects. How do I go about it?
In past projects I have discussed and used different approaches to communicate between Google App Engine and Google Compute Engine. Generally speaking the two usual suspects are:
Orchestration from App Engine: In this approach the App Engine application is the active part and sends requests to a service on the compute instance. This is what Igor Artamonov already described in his comment. We used a tomcat instance on the compute instances which ran a full rest api to invoke commands on the instance. Possible helpers:
When using the Google Compute API from App Engine you can get the external IP address of compute instances. So you know where your requests will have to go.
Polling from the compute instance: Since you know the app id of your App Engine application you could code a simple loop on your compute instances that request new jobs from the app engine application. I have used this approach in combination with an orchestration that will send a shutdown command to instances that are no longer required, therefor reducing the polling load on app engine. If new jobs were created I would start a new compute instance which would then poll until it receives a shutdown command again.
Both approaches work well. If you use the Compute API and know the IPs of your compute instances you can restrict your polling endpoints and command invoke requests to these IPs for basic security.
I would try to avoid too much polling though since, well let me give you a quote:
Actively polling is the poor man's solution to kicking off a workflow process. (javaworld.com)
But if you shut down your compute instances when they are finished with their workload i don't see a good reason why you shouldn't use polling. If you don't and you increase the number of compute instances to a couple instances you will have load on your App Engine application without achieving anything but cost.

What are some of the most appropriate ways for serving a large scale django app on Google Compute Engine?

I am working on a project that will presumably have a lot of user uploaded content and also a fairly large user base. I am now looking for deploying this app to the Google Compute Engine.
I have looked up for the possible options and nginx+gunicorn seems to be a good option. In the beginning I am going to be using a single ns-1 instance with 100 GB persistent hard drive and google cloud sql for serving my database.
But I want to make things scalable so that I can add more instances and disk storage without any hustle in the future. But I am very confused how to do that. So the main concern is.
I want such setup so that I can extend my disk space and no. of Google Compute Instances whenever I want.
In order to have a fully scalable architecture, a good approach is to separate computation / serving, from file storage, and both from data storage. Going part by part:
file storage - Google Cloud Storage - by storing common service files in a GCS bucket, you get a central repository that is both highly-redundant, and scalable;
data storage - Google Cloud SQL - gives you a highly reliable, scalable MySQL-like database back-end, which can be resized at will to accommodate increasing database usage;
front-ends - GCE instance group - template-generated web / computation front-ends, setting up a resource pool into which a forwarding rule (load balancer) distributes incoming connections.
In a nutshell, this is one of the most adaptable set-ups I can think of, while you keep control over every aspect of the service and underlying infrastructure.
A simple approach would be to run a Python app on Google App Engine, which will auto-scale your instances (both up and down) and it supports Django, as mentioned by #spirulence in the comments.
Here are some starting points:
Django and Cloud SQL support on App Engine
Running Pure Django Projects on Google App Engine
Third-party Libraries in Python 2.7
The last link shows which versions of Django are currently supported.

Can you get a cluster of Google Compute Engine instances that are *physically* local?

Google Compute Engine lets you get a group of instances that are semantically local in the sense that only they can talk to each other and all external access has to go through a firewall etc. If I want to run Map-Reduce or other kinds of cluster jobs that are going to induce high network traffic, then I also want machines that are physically local (say, on the same rack). Looking at the APIs and initial documentation, I don't see any way to request that; does anyone know otherwise?
There is no support in GCE right now for specifying rack locality. However, we built the system to work well in the face of large numbers of instances talking to each other in a fully connected way, as long as they are in the same zone.
This is one of the things that allowed MapR to approach the record for a hadoop terasort. You can see that in action in the video for the Criag Mcluckie's talk from IO:
https://developers.google.com/events/io/sessions/gooio2012/302/
The best way to see is to test out your application and see how it works.