Access google compute engine from google app engine [duplicate] - google-cloud-platform

I have an instance running on the Compute Engine which uses Torch to predict objects in images. I wanted to make a simple web interface using which a user can upload an image, the image is sent to the server(compute engine), the objects are predicted and the list is returned back to the user.
In my compute engine (Ubuntu 14.04) this line of code is used to predict objects in images. (All the other setup has been already done in the compute engine.)
th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10
I want to call this line from the web app and pass the image to the image folder and get back the list of objects. How do I go about it?

In past projects I have discussed and used different approaches to communicate between Google App Engine and Google Compute Engine. Generally speaking the two usual suspects are:
Orchestration from App Engine: In this approach the App Engine application is the active part and sends requests to a service on the compute instance. This is what Igor Artamonov already described in his comment. We used a tomcat instance on the compute instances which ran a full rest api to invoke commands on the instance. Possible helpers:
When using the Google Compute API from App Engine you can get the external IP address of compute instances. So you know where your requests will have to go.
Polling from the compute instance: Since you know the app id of your App Engine application you could code a simple loop on your compute instances that request new jobs from the app engine application. I have used this approach in combination with an orchestration that will send a shutdown command to instances that are no longer required, therefor reducing the polling load on app engine. If new jobs were created I would start a new compute instance which would then poll until it receives a shutdown command again.
Both approaches work well. If you use the Compute API and know the IPs of your compute instances you can restrict your polling endpoints and command invoke requests to these IPs for basic security.
I would try to avoid too much polling though since, well let me give you a quote:
Actively polling is the poor man's solution to kicking off a workflow process. (javaworld.com)
But if you shut down your compute instances when they are finished with their workload i don't see a good reason why you shouldn't use polling. If you don't and you increase the number of compute instances to a couple instances you will have load on your App Engine application without achieving anything but cost.

Related

Recommended way to run a web server on demand, with auto-shutdown (on AWS)

I am trying to find the best way to architect a low cost solution to provide an on-demand web server for a certain amount of time.
The context is as follows: I have some large amount of data sitting on S3. From time to time, users will want to consult that data. I've written a Flask app that can display the data in a nice way for them. Beign poorly written, it really only accepts a single user session at the time. Currently therefore they have to download the Flask app and run it on their own machine.
I would like to find a way for users to request a cloud-based web server that would run the Flask app (through a docker container for example) on-demand, and give them access to it quickly, without having to do much if anything on their own machine.
Every user wanting to view the data would have their own web server created on demand (to avoid multiple users sharing the same web server, which wouldn't work with my Flask app)
Critically, and in order to avoid cost, the web server would terminate itself automatically after some (configurable) idle time (possibly with the Flask app informing the user that it's about to shut down, so that they can "renew" the lease).
Initially I thought that maybe AWS Fargate would be good: it can run docker instances, is quite configurable in terms of CPU/disk it can get (my Flask app is resource-hungry), and at least on paper could be used in a way that there is zero cost when users are not consulting the data (bar S3 costs naturally). But it's when it comes to the detail that I'm not sure...
How to ensure that every new user gets their own Fargate instance?
How to shut-down the instance automatically after idle time?
Is Fargate quick enough in terms of boot time?
The closest I can think is AWS App Runner. It's built on top of Fargate and it provides an intelligent scale out mechanism (probably you are not interested in this) as well as a scale to (almost) 0 capability. The way it works is that when the endpoint is solicited and it's doing work you pay for the entire fargate task (cpu/memory) you have selected in the configuration. If the endpoint is doing nothing you only pay for the memory (note the memory cost is roughly 20% of the entire cost so it's not scale to 0 but "quasi"). Checkout the pricing examples at the bottom of this page.
Please note you can further optimize costs by pausing/starting the endpoint (when it's paused you pay nothing) but in that case you need to create the logic that pauses/restarts it.
Another option you may want to explore is using Lambda this way (which would allow you to use the same container image and benefit from the intrinsic scale to 0 of Lambda). But given your comment "Lambda doesn’t have enough power, and the timeout is not flexible" may be a show stopper.

How to assign requests from each user always to a same instance in an instance group?

I have a deep learning web application deployed in GCE. I created a template to build a VM instance group. Then added loading balancing to it.
I plan that when each user accesses the URL, the requests from the same user will always be assigned to a VM instance. I use gunicorn -b 0.0.0.0:5000 wsgi:app -t 600 as part of the startup script. (I also tried with workers, gevent. But requests from different users can be handled in the same instance, as a result of which, the results were affected by each other. So I want requests from different users to be handled in different instances.)
To do so, I tried different CPU utilization for autoscaling. It can autoscale with new instances. But from the results, sometimes the requests are still handled in the same instance.
I also tried Kurbenetes, app engine, and cloud run. Mistakes are similar. I feel I am working in the wrong direction.
Thanks in advance.
---UPDATE---
As mentioned by #John Hanley, assigning requests from a user always to the same instance is not the targeted feature of these products. If you are looking for the answer to this question, you may try the Cloud Tasks + App Engine.
Actually, I want requests from different users to be handled in different instances so that the back-end deep learning algorithm's results cannot affect each other.
So, instead of spinning up an instance, another way to solve this is to store necessary data from each user in a common database with a unique session ID.
A simple demo can be found in https://cloud.google.com/python/docs/getting-started/session-handling-with-firestore
Hope this can be helpful for anyone struggling with similar problems.
As mentioned in the comment section by #John Hanley, the best approach would be to use App Engine + Cloud Tasks, I suggest you to check the next tutorial, although it uses ngrok instead of unicorn the idea of the of the workflow should be similar for what you want to achieve.

Object Detection Django Rest API Deployment on Google Cloud Platform or Google ML Engine

I have developed Django API which accepts images from livefeed camera using in the form of base64 as request. Then, In API this image is converted into numpy arrays to pass to machine learning model i.e object detection using tensorflow object API. Response is simple text of detected objects.
I need GPU based cloud instance where i can deploy this application for fast processing to achieve real time results. I have searched a lot but no such resource found. I believe google cloud console (instances) can be connected to live API but I am not sure how exactly.
Thanks
I assume that you're using GPU locally or wherever your Django application is hosted.
First thing is to make sure that you are using tensorflow-gpu and all the necessary setup for Cuda is done.
You can start your GPU instance easily on Google Cloud Platform (GCP). There are multiple ways to do this.
Quick option
Search for notebooks and start a new instance with the required GPU and
RAM.
Instead of the notebook instance, you can set up the instance separately if you need some specific OS and more flexibility on choosing the machine.
To access the instance with ssh simply add your ssh public key
to Metadata which can be seen when you open the instance details.
Setup Django as you would do on the server. To test it simply just debug run it on host 0 or 0.0.0.0 and preferred port.
You can access the APIs with the external IP of the machine which can be found out in the instance details page.
Some suggestions
While the first option is quick and dirty, it's not recommended to use that in production.
It is better to use some deployment services such as tensorflow-serving along with Kubeflow.
If you think that you're handling the inference properly itself, then make sure that you load balance the server properly. Use NGINX or any other good server along with gunicorn/uwsgi.
You can use redis for queue management. When someone calls the API, it is not necessary that GPU is available for the inference. It is fine not to use this when you have very less number of hits on the API per second. But when we think of scaling up, think of 50 requests per second which a single GPU can't handle at a time, we can use a queue system.
All the requests should directly go to redis first and the GPU takes the jobs required to be done from the queue. If required, you can always scale the GPU.
Google Cloud actually offers Cloud GPUs. If you are looking to perform higher level computations with your applications that require real-time capabilities I would suggest your look into the following link for more information.
https://cloud.google.com/gpu/
Compute Engine also provides GPUs that can be added to your virtual machine instances. Use GPUs to accelerate specific workloads on your instances such as Machine Learning and data processing.
https://cloud.google.com/compute/docs/gpus/
However, if your application requires a lot of resources you’ll need to increase your quota to ensure you have enough GPUs available in your project. Make sure to pick a zone where GPUs are available. If this requires much more computing power you would need to submit a request for an increase of your quota. https://cloud.google.com/compute/docs/gpus/add-gpus#create-new-gpu-instance
Since you would be using the Tensorflow API for your application on ML Engine I would advise you to take a look at this link below. It provides instructions for creating a Deep Learning VM instance with TensorFlow and other tools pre-installed.
https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance

What are some of the most appropriate ways for serving a large scale django app on Google Compute Engine?

I am working on a project that will presumably have a lot of user uploaded content and also a fairly large user base. I am now looking for deploying this app to the Google Compute Engine.
I have looked up for the possible options and nginx+gunicorn seems to be a good option. In the beginning I am going to be using a single ns-1 instance with 100 GB persistent hard drive and google cloud sql for serving my database.
But I want to make things scalable so that I can add more instances and disk storage without any hustle in the future. But I am very confused how to do that. So the main concern is.
I want such setup so that I can extend my disk space and no. of Google Compute Instances whenever I want.
In order to have a fully scalable architecture, a good approach is to separate computation / serving, from file storage, and both from data storage. Going part by part:
file storage - Google Cloud Storage - by storing common service files in a GCS bucket, you get a central repository that is both highly-redundant, and scalable;
data storage - Google Cloud SQL - gives you a highly reliable, scalable MySQL-like database back-end, which can be resized at will to accommodate increasing database usage;
front-ends - GCE instance group - template-generated web / computation front-ends, setting up a resource pool into which a forwarding rule (load balancer) distributes incoming connections.
In a nutshell, this is one of the most adaptable set-ups I can think of, while you keep control over every aspect of the service and underlying infrastructure.
A simple approach would be to run a Python app on Google App Engine, which will auto-scale your instances (both up and down) and it supports Django, as mentioned by #spirulence in the comments.
Here are some starting points:
Django and Cloud SQL support on App Engine
Running Pure Django Projects on Google App Engine
Third-party Libraries in Python 2.7
The last link shows which versions of Django are currently supported.

Can you get a cluster of Google Compute Engine instances that are *physically* local?

Google Compute Engine lets you get a group of instances that are semantically local in the sense that only they can talk to each other and all external access has to go through a firewall etc. If I want to run Map-Reduce or other kinds of cluster jobs that are going to induce high network traffic, then I also want machines that are physically local (say, on the same rack). Looking at the APIs and initial documentation, I don't see any way to request that; does anyone know otherwise?
There is no support in GCE right now for specifying rack locality. However, we built the system to work well in the face of large numbers of instances talking to each other in a fully connected way, as long as they are in the same zone.
This is one of the things that allowed MapR to approach the record for a hadoop terasort. You can see that in action in the video for the Criag Mcluckie's talk from IO:
https://developers.google.com/events/io/sessions/gooio2012/302/
The best way to see is to test out your application and see how it works.