Where to keep the Dataflow and Cloud composer python code? - google-cloud-platform

It probably is a silly question. In my project we'll be using Dataflow and Cloud composer. For that I had asked permission to create a VM instance in the GCP project to keep the both the Dataflow and Cloud composer python program. But the client asked me the reason of creation of a VM instance and told me that you can execute the Dataflow without the VM instance.
Is that possible? If yes how to achieve it? Can anyone please explain it? It'll be really helpful to me.

You can run Dataflow pipelines or manage Composer environments in you own computer once your credentials are authenticated and you have both the Google SDK and Dataflow Python library installed. However, this depends on how you want to manage your resources. I prefer to use a VM instance to have all the resources I use in the cloud where it is easier to set up VPC networks including different services. Also, saving data from a VM instance into GCS buckets is usually faster than from an on-premise computer/server.

Related

How to Deploy a docker container with volume in Cloud Run

I am trying to publish an application I wrote in .NET Core with docker and a mounted volume. I can't really figure out or see any clear solution to my issue that will be cheap (Its for a university project.)
I tried running a docker-compose via a cloudbuild.yml linked in this post with no luck, also tried to put my dbfile in a firebase project and tried to access it via the program but it didn't work. I also read in the GCP documentation that i can probably use Filestore but the pricing is way out of budget for me. I need to publish an SQLite so my server can work correctly, that's it.
Any help would be really apreciated!
Basically, you can't mount volume in Cloud Run. It's a stateless environment and you can't persist data on it. You have to use external storage to persist your data. See the runtime contract
WIth the 2nd execution runtime environment, you can now mount Cloud Storage bucket with GCSFuse, and Filestore path with NFS

How to use existing Compute Engine for Google Cloud Build?

How would you use an existing Compute Engine VM instance for a Google Cloud Build pipeline?
I know there's been a similar question in the past, however, the suggested answer is not really what I want - creating and then destroying a Compute Engine with every build.
In settings, Cloud Build allows you to enable "service account permissions" for Compute Engine (Compute Instance Admin (v1)), but I've found no information how to use that permission and service for running the build process with one of your predefined VM instances.
Or maybe I misunderstand the answer in the linked thread above and
COMMAND=sudo supervisorctl restart
actually restarts the existing VM supervisorctl? Any help would be appreciated.
You can't run a Cloud Build build on a GCE instance. The most customizable option you would have is to run the build on a private pool. But even in those cases it's always managed, you never have access to the underlying VM.
Another option would be to start a powerful GCE instance with Cloud Build via the GCE API, run your operations there and then stop the GCE instance.

GCP AI notebook instance permission

I have a GCP AI notebook instance. Anyone with admin access to notebook in my project can open notebook and read, modify or delete any folder/file which is created by me or any other user in my team. Is there a way to create a private repository like /home/user, like we could've done if we used JupyterHub installed on a VM?
Implementing your requirement is not feasible with AI Notebooks. AI Notebooks is intended for a rapid prototyping and development environment that can be easily managed, and advanced multi-user scenarios fall outside its intended purpose.
The Python Kernel in AI Notebooks always runs under the Linux user "Jupyter" regardless of what GCP user accesses the notebook. Anyone who has editor permissions to your Google Cloud project can see each other's work through the Jupyter UI.
In order to isolate the user's work, the recommended option is to set up individual notebook instances for each user. Please find the 'Single User' option.
It’s not feasible to combine multiple instances to a master instance in AI Notebooks. So, the recommended ways to give each user a Notebook instance and share any source code via GIT or other repository system. Please find Save a Notebook to GitHub doc for more information.
You probably created a Notebook using Service Account mode. You can provide access to single users only via single-user mode
Example:
proxy-mode=mail,proxy-user-mail=user#domain.com

Is there a way to run GCP workflows locally?

Recently I started working with GCP workflows, and functions. We are using serverless framework for the functions and we can run them in our on computers with the command serverless invoke local --function <function_name> so we don't have to spend cloud executions.
What I'm looking now is if there is a way to do the same thing with GCP workflows, to run them in our own computers instead of invoking them inside the cloud.
I already read the resources from google and from many different articles but I still not find the trick (if it actually exists)
Today, there is no emulator for Cloud Workflows. But if you can afford to deploy your cloud functions on GCP, Cloud workflows has a generous free tier: 5000 steps for free

Triggering a training task on cloud ml when file arrives to cloud storage

I am trying to build an app where the user is able to upload a file to cloud storage. This would then trigger a model training process (and predicting later on). Initially I though I could do this with cloud functions/pubsub and cloudml, but it seems that cloud functions are not able to trigger gsutil commands which is needed for cloudml.
Is my only option to enable cloud-composer and attach GPUs to a kubernetes node and create a cloud function that triggers a dag to boot up a pod on the node with GPUs and mounting the bucket with the data? Seems a bit excessive but I can't think of another way currently.
You're correct. As for now, there's no possibility to execute gsutil command from a Google Cloud Function:
Cloud Functions can be written in Node.js, Python, Go, and Java, and are executed in language-specific runtimes.
I really like your second approach with triggering the DAG.
Another idea that comes to my mind is to interact with GCP Virtual Machines within Cloud Composer through the Python operator by using the Compute Engine Pyhton API. You can find more information in automating infrastructure and taking a deep technical dive into the core features of Cloud Composer here.
Another solution that you can think of is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. Please, have a look on Codelabs tutorial.
I hope you find the above pieces of information useful.