setting up cluster on GCP with Cloudera Director - google-cloud-platform

I'm following along with the instructions on Cloudera's website to set up a cluster using Cloudera Director. However, when I get to the step where I'm supposed to "Add an Environment," I'm presented with two issues. First, the region I selected (us-east1-b) when configuring my Google Compute instance is not available for selection on the Cloudera Director software. Second, there is no option for me to upload Client ID JSON Keys, as the documentation says we should be able to do. I've attached a screenshot of what I'm looking at. Any clues?
My Cloudera director software is reporting itself as version 2.1.1, and the docs I'm looking at are for version 2.1.x. Am I somehow working with an older version of the software? Or are the Cloudera docs not in line with the current version? Can anyone else running Cloudera 2.1.1 confirm that they're seeing something similar or different?

There is a field to load the Client ID JSON keys in the "Advanced Options" section under General Information. Click the > to expand the Advanced Options.
You should be able to type in the region you want even if it isn't provided as a value in the drop-down.

Related

Dataproc custom image: Cannot complete creation

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

Create and customize compute profiles disabled in version 6.4 Data Fusion

I need to select a custom profile ([Configuring your pipeline to use the custom profile] https://cloud.google.com/data-fusion/docs/how-to/running-against-existing-dataproc) for Running a pipeline against an existing Dataproc cluster. According to the type of instance I have (DEVELOPER), higher than version 6.3 the Compute config option should be enabled:
Any idea why I can't see it?
Comparison of Developer, Basic, and Enterprise editions
I’ve configured an Exiting Dataproc profile:
Compute profile
But when I need select this new profile in the “Compute config” option I can't see it even though the documentation indicates that in versions higher than 6.3 it should be enabled
option disabled on my instance
My Data Fusion instance:
My instance created
If I understand the problem correctly, you are able to create the profile but not use in the pipelines? From the screenshot, looks like you are looking at wrong place. Here is where it can be set.
https://cloud.google.com/data-fusion/docs/how-to/running-against-existing-dataproc#configuring_your_pipeline_to_use_the_custom_profile_2

Invalid arguments when creating new datalab instance

I am following the quickstart tutorial for datalab here, within the GCP console. When I try to run
datalab beta create-gpu datalab-instance-name
In step 3 I receive the following error
write() argument must be str, not bytes
Can anyone help explain why this is the case and how to fix it?
Thanks
Referring to the official documentation, before running Datalab instance, the corresponding APIs should be enabled: Google Compute Engine and Cloud Source Repositories APIs. To do so, visit Products -> APIs and Services -> Library and search for the APIs. Additionally, make sure that billing is enabled for your Google Cloud project.
You can also enabling the APIs by typing the following command, which will give you a prompt to enable the API:
datalab list
I made some research and found that the same issue has been reported on the Github page. If enabling API's wouldn't work, the best option would be to contribute (add a comment) in the mentioned Github topic to make it more visible to the Datalab Engineering team.

How to apply rolling updates in VM instances instead of using Managed Instance group in GCP?

Problem: I want to apply patch updates in a VM instance which is not a part of a Managed Instance Group. The patch update could be-
A change in the version of the current OS of a VM instance, that is, change from Ubuntu-16-v1 to Ubuntu-16-v2.
An upgrade of the OS boot, that is, changing from Ubuntu-16 OS to Ubuntu-18 OS.
Installation of a new package in the existing machine.
Exploration:
For Problem 1 & 2 stated above
I have explored and tried the rolling update feature present in Managed Instance Group in the Google Cloud Platform and this seems to be a good approach for the problem stated, but what should be the best approach with best practices if someone is not using a Managed Instance Group? You may find the details here.
For Problem 3 stated above
I have tried the Os-patch Management service of GCP but is there any other method that I could use?
Create an "image" from the boot disks of your existing Compute Engine instances.
For updating with newer configurations and software, group images in "image family" which always points to the latest image.
See https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images#setting_families
For your use case, I think you should use IAC script like terraform to recreate similar VMs with the same name, disk, internal address, etc..and call the script from the repo directly on a scheduled date automatically or provide self patch instructions.
Here is the likely process:
Send Email Notification to all the VM owners that Auto-Patch is
scheduled on XYZ.
Email content should include an Instance list going to be
patched/update, list of action, patch team contact details.
An email should also include a link for skipping this auto-update and perform "Self Patching instruction"
documents
Self patching documents should have a command to call autopatch
wrapper script like: "curl -u "encrypted-auth:x-oauth-basic" -k -H 'Accept:
application/vnd.github.VERSION.raw'
'https://github.com/api/v3/repos/xyz/images/contents/gcp/patch_OS_update.sh?ref=master'
|bash -s -- -q"
The above script can also have other options like to query patchset available for particular VM or scan the VM for pending updates

GCP: How to check/find logs for metadata added in Google Cloud Platform project

Want to find logs for Metadata added for google cloud project i.e. Project Metadata, not for compute/VM instance metadata.
Tried to find in stack-driver logging but it has showing only for compute instances like as compute.instances.setMetadata or compute.instances.insert or compute.instances.delete etc.
I am looking for metadata or property added in/for GCP project(Not for VM instance metadata). Reason behind this, someone is adding/modifying property and we are unable to find the history to track this change and it causes to failure of application.
For future readers, you can add the following to your query to look for project metadata:
protoPayload.methodName="v1.compute.projects.setCommonInstanceMetadata"
You could try looking at the Activity page - https://console.cloud.google.com/home/activity
The Logs console also has a Google Project resource that you can filer on.