I have stored thousands of images in GCP Cloud Storage in very high resolution. I want to serve these images in an iOS/Android App and on a website. I don't want to serve all the time the high-resolution version and wondered whether I have to create duplicate images in different resolutions - which seems very inefficient. The perfect solution would be that I can append a parameter like ?size=100 to the image URL. Is something like that natively possible with GCP Cloud Storage?
I don't find anything in the documentation from cloud storage: https://cloud.google.com/storage/docs.
Several other resources link to deprecated solutions: https://medium.com/google-cloud/uploading-resizing-and-serving-images-with-google-cloud-platform-ca9631a2c556
What is the best solution to implement such functionality?
Cloud Storage currently does not have Imaging services yet, though a Feature Request already exists. I highly suggest that you "+1" and "star" this issue to increase its chance to be prioritized in development.
You are right that this use case is common. Image API is a Legacy App Engine API. It's no longer a recommended solution because Legacy App Engine APIs are only available in older runtimes that have limited support. GCP would advise developers to use Client Libraries instead but since your requested feature is not yet available, then you'll have to use third-party imaging libraries.
In this case, developers are commonly using Cloud Functions with Cloud Storage Trigger, thus resizing and creating duplicate images in different resolutions. While you may find the solution inefficient, unfortunately there's not much choice but to process those images until the feature request becomes available in public.
One good thing though is that Cloud Functions supports multiple runtimes so you can write code in any supported languages and pick libraries you're comfortable using. If you're using Node runtime, feel free to check this sample that automatically creates thumbnail when an image is uploaded to Cloud Storage.
Related
I have developed Django API which accepts images from livefeed camera using in the form of base64 as request. Then, In API this image is converted into numpy arrays to pass to machine learning model i.e object detection using tensorflow object API. Response is simple text of detected objects.
I need GPU based cloud instance where i can deploy this application for fast processing to achieve real time results. I have searched a lot but no such resource found. I believe google cloud console (instances) can be connected to live API but I am not sure how exactly.
Thanks
I assume that you're using GPU locally or wherever your Django application is hosted.
First thing is to make sure that you are using tensorflow-gpu and all the necessary setup for Cuda is done.
You can start your GPU instance easily on Google Cloud Platform (GCP). There are multiple ways to do this.
Quick option
Search for notebooks and start a new instance with the required GPU and
RAM.
Instead of the notebook instance, you can set up the instance separately if you need some specific OS and more flexibility on choosing the machine.
To access the instance with ssh simply add your ssh public key
to Metadata which can be seen when you open the instance details.
Setup Django as you would do on the server. To test it simply just debug run it on host 0 or 0.0.0.0 and preferred port.
You can access the APIs with the external IP of the machine which can be found out in the instance details page.
Some suggestions
While the first option is quick and dirty, it's not recommended to use that in production.
It is better to use some deployment services such as tensorflow-serving along with Kubeflow.
If you think that you're handling the inference properly itself, then make sure that you load balance the server properly. Use NGINX or any other good server along with gunicorn/uwsgi.
You can use redis for queue management. When someone calls the API, it is not necessary that GPU is available for the inference. It is fine not to use this when you have very less number of hits on the API per second. But when we think of scaling up, think of 50 requests per second which a single GPU can't handle at a time, we can use a queue system.
All the requests should directly go to redis first and the GPU takes the jobs required to be done from the queue. If required, you can always scale the GPU.
Google Cloud actually offers Cloud GPUs. If you are looking to perform higher level computations with your applications that require real-time capabilities I would suggest your look into the following link for more information.
https://cloud.google.com/gpu/
Compute Engine also provides GPUs that can be added to your virtual machine instances. Use GPUs to accelerate specific workloads on your instances such as Machine Learning and data processing.
https://cloud.google.com/compute/docs/gpus/
However, if your application requires a lot of resources you’ll need to increase your quota to ensure you have enough GPUs available in your project. Make sure to pick a zone where GPUs are available. If this requires much more computing power you would need to submit a request for an increase of your quota. https://cloud.google.com/compute/docs/gpus/add-gpus#create-new-gpu-instance
Since you would be using the Tensorflow API for your application on ML Engine I would advise you to take a look at this link below. It provides instructions for creating a Deep Learning VM instance with TensorFlow and other tools pre-installed.
https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance
I like fast code execution (because of that I switched from Python to Go) and I do not like dependencies. Amazon recommends using SDK for simpler authentication (but in Lambda I can get tokens from IAM from environment variables) and because of built into SDK retry on errors (few lines of code, as I think). Yes it is faster to write my code using SDK, but what additional caveats about using pure HTTP API instead of SDK? Am I too crazy about milliseconds? Such optimizations worth it?
Anything you do with AWS is the result of an API call, whether executed by CLI, Web console, or SDK.
The SDKs make it easier to interact with those APIs. While you may be able to come up with some minor improvements for some calls, overall you will spend a lot of time doing it to very little benefit.
I think the stated focus on performance belies real trade-offs.
Consider that someone will have to maintain your code -- if you use an API, the test area is small, but AWS APIs might change or be deprecated; if you an SDK, next programmer will plug in new SDK version and hope that it works, but if it doesn't they'd be bogged down by sheer weight of the SDK.
Likewise, imagine someone needs to do a security review of this app, or to introduce something not yet covered by SDK (let's imagine propagating accounting group from caller role to underlying storage).
I don't think there is a clear answer.
Here are my suggestions:
keep it consistent -- either API or SDK (within given app)
consider the bigger picture (how many apps do you plan to write?)
don't be afraid to switch to the other approach later
I've had to decide on something similar in the past, with Docker (much nicer APIs and SDKs/libs). Here's how it played out:
For testing, we ended up using beta version of Docker Python bindings: prod version was not enough, and bindings (your SDK) were overall pretty good and clear.
For log scraping, I used HTTP calls (your API), "because performance", in reality comparative mental load using API vs SDK, and because bindings (SDK) did not support asyncio.
I am working on a project that will presumably have a lot of user uploaded content and also a fairly large user base. I am now looking for deploying this app to the Google Compute Engine.
I have looked up for the possible options and nginx+gunicorn seems to be a good option. In the beginning I am going to be using a single ns-1 instance with 100 GB persistent hard drive and google cloud sql for serving my database.
But I want to make things scalable so that I can add more instances and disk storage without any hustle in the future. But I am very confused how to do that. So the main concern is.
I want such setup so that I can extend my disk space and no. of Google Compute Instances whenever I want.
In order to have a fully scalable architecture, a good approach is to separate computation / serving, from file storage, and both from data storage. Going part by part:
file storage - Google Cloud Storage - by storing common service files in a GCS bucket, you get a central repository that is both highly-redundant, and scalable;
data storage - Google Cloud SQL - gives you a highly reliable, scalable MySQL-like database back-end, which can be resized at will to accommodate increasing database usage;
front-ends - GCE instance group - template-generated web / computation front-ends, setting up a resource pool into which a forwarding rule (load balancer) distributes incoming connections.
In a nutshell, this is one of the most adaptable set-ups I can think of, while you keep control over every aspect of the service and underlying infrastructure.
A simple approach would be to run a Python app on Google App Engine, which will auto-scale your instances (both up and down) and it supports Django, as mentioned by #spirulence in the comments.
Here are some starting points:
Django and Cloud SQL support on App Engine
Running Pure Django Projects on Google App Engine
Third-party Libraries in Python 2.7
The last link shows which versions of Django are currently supported.
What is a better mBaaS that supports offline sync and caching?
I am evaluating several mBaaS solutions for my hybrid mobile app under development. I looked at Kinvey, Kii, buddy, and Telerik BackEnd platform. I have also came across some open source solutions like openmobster and dreamfactory. I am looking to store data in sql-lite on mobile app and then sync it back with an online data store. Kinvey has this support, but their pricing model (per user) is not suitable in my scenario. I can see that openmobster does this but, how is what I need to understand? Can I host in on Azure VM or something? Also please suggest if there is any other solution commercial/open source capable of doing offline sync and caching with push notifications and data storage?
DreamFactory could be a good fit for your scenario. It is open source and comes with a full 30 days of free support. After which it's only like $25/month for a developer account - and this isn't even a requirement to use its product. It's specifically a support package.
To address your question a little more in-depth... I don't believe DreamFactory supports offline syncing at the moment, though they plan to very soon. In regards to sql-lite, DreamFactory's (DSP) product has a built in sql-lite driver to connect to that DB. However, it hasn't been tested enough for them to say it is a fully supported RDBMS. One of the beautiful things about DreamFactory is you're able to host the DSP (DreamFactory Service Platform) on Azure and Amazon EC2 instances (cloud solutions), host locally on your own server, or even use its own free hosted edition!
I would definitely take a little time to look into DF. It doesn't seem to me like you have much to lose. Especially, considering it's a free open-source product!
Feel free to ask me any questions you may have about DreamFactory!
-Mark
I have a requirement where I need to upload a large file (can be 10 GB) to a shared space(windows) ( say APP1) . and we have a separate application( say APP2) different network now I need to download the same file from in second application via internet.
My approach is I need to create webservice to upload the document to shared space. and then expose a webservice for outer world to download the document.
My point is how I can manage the huge files upload/download through webservice ?
Please suggest if some one have any idea. I have flexibikity to use any third party APIs.but the application can talk only through webservices.
From your question it's not really clear which development platform you mean, .NET, Java, etc.
Also it's important to know how interoperable your services should be, security requirements, etc. Anyway will try to come up with a couple of solutions which you might research in more detail if you found them useful.
.NET
It's relatively easy to built such a web service with WCF. It supports streaming which could be interoperable, reliable and secure to some extend. You can read more on this here. This approach implies you have a huge disk to store files and you have a good recovery plan for that in case it goes down or just dies.
.NET, Java, etc. - cloud based
There are a lot of vendors who provide cloud storage and APIs to work with it. It's an ideal solution for a quick start. They take care of data availability, redundancy, etc. All you have to do is to use their API to upload and download files and to pay them for this :) In many cases it's worth it. Personally I used to work with Amazon S3. Their API is simple to use and there's plenty of docs for that.
EDIT:
Amazon S3 provides a simple web-services interface that can be used to
store and retrieve any amount of data, at any time, from anywhere on
the web.
I think you should take a look at Amazon S3 overview here.
This also provides API for a number of different platforms - Java, .NET, Node.js, etc. You an find the full list here.
Hope it helps!