I published a gene mapping pipeline that can be downloaded from Github and run on a local machine. The problem is that server setting is not always available (or easy) to people. I created an instance and an image (or machineImage which I don't understand the difference) on Google cloud that can run the pipeline, however I could not make it public so people can use it to create an instance, upload their specific data (fastq files) and run it to map their mutants.
Any help would be highly appreciated. Thank you
As John Hanley mentioned, you cannot make an image public.
According to the official documentation:
Compute Engine Images
Public images are provided and maintained by Google, open source
communities, and third-party vendors. By default, all Google Cloud
projects have access to these images and can use them to create
instances.
Therefore you can use the public images maintained by Google to create your instances.
Related
My use case is:
I have trained model which i want to use for infer small messages.
Not sure about where should i keep my models in cloud run.
inside container
On cloud storage and download it at the time of container start
Mount cloud storage as local directory and use it
I am able to write and run code successfully for option 1 and 2.
Tried option 3 but not lucky there. I am using this link https://cloud.google.com/run/docs/tutorials/network-filesystems-fuse
Actually here my entry point is an pub sub event. thats where i am not able to make it working.
But before exploring more into it i would like to know about which approach is better here. or is there any other better solution.
Thanks for valuable comments, it helped a lot.
If model is static better to club it with container. downloading it from storage bucket or mounting FS will download model again whenever we spin new container.
Background
I have a virtual machine running a code using Google SDK for diffrent products (like Google PubSub). According to Google documentation, my machine should have an environment variable called GOOGLE_APPLICATION_CREDENTIALS and its values should be pointing to a clear text file that holding the service account of the application.
I have done it and it's working for me.
The Problem
It sounds like an unsafe practice to store such a key, in plain text, inside a virtual machine. If the machine has been hacked, this key will be one of the first targets of the attacker.
I was expected to find a solution to "hide" this key file or just encrypt it with a key that my application will be able to read.
I found some code examples (C#), that allow the programmer to pass the credentials manually to the SDK functions. But, it's not a standard way to do it and it's being changed from one product to another (seems impossible in some products).
What is the best practice to do it?
Have a good read at the following:
https://cloud.google.com/docs/authentication/production
This describes a concept called "Application Default Credentials". The concept here is that a Compute Engine (a virtual machine) has a default service account (that you can configure) associated with it. Applications running on the Compute Engine can thus make requests from that Compute Engine to other GCP services and the requests to those services will implicitly appear to come from the service account configured against the Compute Engine.
The key phrase in the article is:
If the environment variable GOOGLE_APPLICATION_CREDENTIALS isn't set, ADC uses the default service account that Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions provide.
I've looked for this across the web a few times, and I feel like this hasn't been asked exactly, or I may just be getting bogged down with the wrong syntax. Hoping to get an easy answer here (yes, you can't get this, is an acceptable answer).
The variations from the base CentOS image are listed here: Link to GCP
However, they don't actually provide a download for this image. I'm trying to get a local VM running in VMWare with this image.
I feel as though they'd provide this to their clients to make it easier to prepare for use of their product, but I'm not finding it anywhere.
If anyone could toss me a link to a pre-configured CentOS ISO with the minor changes, I'd definitely take that as an alternative. I'm just not confident in my skills with Linux enough to configure the firewall properly :)
GCP doesn't support Google-provied images for exporting. However, they support exporting images for custom images.
I don't have any experience about image exporting, but I think this works.
Create custom images
You can create custom images based on your GCE VM instance.
Go navigation -> Compute engine -> images page.
You can create custom image via disk or snapshot in this page.
Select one and create a custom image.
Export your image
After creating custom image successfully, Go custom image page and click "export" on upper side.
Select export format and GCS destination. then click export.
Now you have an image in the Google Cloud storage.
Download image file and import to your local VM machine.
GCP has a deep learning VM available to run on their cloud compute platform. The details about the image is here
So, I am using the google python client to launch my instances and the documentation for this is available here. Now, the way one specifies the disk and the boot image is through this JSON blob:
'disks': [
{
'boot': True,
'autoDelete': True,
'initializeParams': {
'sourceImage': source_disk_image,
}
}
]
Now the source_disk_image is specified by the path to some public image like:
projects/debian-cloud/global/images/family/debian-9 or some variant of this type. Now, my question is how can I specify some marketplace image to be used for my instance?
If you're not attached to using the marketplace to create the VM, there's a lot of documentation about all the available Google Deep Learning images.
They live in the deeplearning-platform-release project, so, for example, I think (but am not sure) the default image you are referring to from the Marketplace you linked is projects/deeplearning-platform-release/global/images/tf-1-14-cu100-20191004 but you can also pull them by family and just get the latest versions, for example, projects/deeplearning-platform-release/global/images/family/tf-latest-gpu.
The gcloud images command is also pretty illuminating to see the description of a given family choice or image, e.g.:
$ gcloud compute images describe-from-family tf-latest-gpu --project deeplearning-platform-release
archiveSizeBytes: '322993843200'
creationTimestamp: '2019-10-06T13:57:56.932-07:00'
description: "Google, Deep Learning Image: TensorFlow 1.14.0, m36, TensorFlow 1.14.0\
\ with CUDA 10.0 and Intel\xAE MKL-DNN, Intel\xAE MKL."
diskSizeGb: '30'
...
Which looks a lot like the Marketplace description.
That said, it looks like the Marketplace might be doing other things though (e.g. there are checkboxes about installing particular drivers separate from choosing the image).
I think that #Ernesto's tip about creating an instance off the marketplace, and then viewing that instance via the REST link at the bottom of the instance page to find exactly how it was created is also good advice. However, in this case you probably want to view the disk that was created (not the instance, since once it is created it only references the disk resource), click on the rest link, and look for the "sourceImage" portion of the REST response.
e.g. from a regular old debian-9 disk (I don't have GPU quota so I can't actually create the marketplace deployment):
I was able to find the SourceImage of a Deep Learning found in the marketplace, for this example I'm using
NVIDIA GPU Cloud Image for Deep Learning, Data Science, and HPC
"name": "nvidia-gpu-cloud-image-20190809",
"selfLink": "projects/nvidia-ngc-public/global/images/nvidia-gpu-cloud-image-20190809",
"sourceDisk": "projects/nvidia-ngc-dev/zones/us-central1-a/disks/chetan-official-base-image"
Deploy an instance from the MarketPlace
Go to the instance and inspect the details from the UI
In the Boot disk section click on the image name
nvidia-gpu-cloud-image-20190809 it will take you to the image
details page
Click on REST at the bottom of the description
Find SelfLink or SourceDisk entry
I have a software that process some files. What I need is:
start a default image on google cloud (I think docker should be a good solution) using an API or a run command
download files from google storage
process it, run my software using those downloaded files
upload the result to google storage
shut the image down, expecting not to be billed anymore
What I do know is how to create my image hehe. But I can't find any info saying me what google cloud service should I use or even if I could do it like I'm thinking. I think I'm not using the right keywords to find what i need.
I was looking at Kubernetes, but i couldn't figure out how to manipulate those instances to execute a one time processing.
[EDIT]
Explaining better the process I have an app that receive images and send it to Google storage. After that, I need to process that images, apply filters, georeferencing, split image etc. So I want to start a docker image to process it and upload the results to google cloud again.
If you are using any of the runtimes supported by Google Cloud Functions, they are easiest way to do those kind of operations (i.e. fetch something from Google Cloud Storage, perform some actions on those files and upload them again). The Cloud Functions will be triggered by an event of your choice, and after the job, it will die.
Next option in terms of complexity would be to deploy a Google App Engine application in standard environment. It allows you to deploy your own application written in any of the supported languages for this environment. While there is traffic in your application, you will have instances serving, but the number of instances running can go down to 0 when they are not serving, which would mean less cost.
Another option would be Google App Engine in flexible environment. This product allows you to deploy your application in any custom runtime. This option has always at least one instance running, so it would never shut down.
Lastly, you can use Google Compute Engine to "create and run virtual machines on Google infrastructure". Otherwise than GAE, this is not that managed by Google, which means that most of the configuration is up to you. In this case, you would need to programmatically indicate your VM to shut down after you have finished your operations.
Based on your edit where you stated that you already have an app that is inserting images into Google Cloud Storage, your easiest option would be to use Cloud Functions that are triggered by additions, changes, or deletions to objects in Cloud Storage buckets.
You can follow the Cloud Functions tutorial for Cloud Storage to get an idea of the generic process and then implement your own code that handles your specific tasks. There are other tutorials like the Imagemagick tutorial for Cloud Functions that might also be relevant to the type of processing you intend to do.
Cloud Functions is probably your lightest weight approach. You could of course do more full scale applications, but that is likely overkill, more expensive, and more complex. You can write your processing code in Node.js, Python, or Go.