Has the Google Cloud Dataproc preview image's Spark version changed? - google-cloud-platform

I recently started a Spark cluster on Google Cloud Dataproc using the 'preview' image. According to the documentation, the preview image's Spark version is '2.1.0', however running spark-shell --version reveals that the cluster is in fact running Spark 2.2.0. This is a problem for us, because our version of spark-avro is not compatible with Spark 2.2.0. Is anyone else experiencing this issue? I haven't been able to find any trace of an official announcement from Google regarding the version bump.

Sorry about that, it appears the minor release notes for the recent preview image update got lost in the ether; the documentation should hopefully be updated by tomorrow. Indeed you're right that the current Dataproc preview version is now Spark 2.2.0. If you need to pin to a known working older preview image, you can try:
gcloud dataproc clusters create --image https://www.googleapis.com/compute/v1/projects/cloud-dataproc/global/images/dataproc-1-2-20170227-145329
That should contain Spark 2.1.0. That said, keep in mind that in general it's always possible that incompatible changes may be made in new preview images, and pinning to that older preview image isn't guaranteed to continue working long term.
In your case, do you happen to know whether you're hitting this issue filed on spark-avro or is it something specific to your version? Ideally we should get you updated to Spark 2.2, since an official (non-preview) image version is going to be imminent with Spark 2.2.

Related

Where can I find historic version info after a GKE Cluster upgrade?

Where can I find historic version info after a GKE Cluster upgrade?
We recently had an automatic update, and I'm wondering what version we had before this current version. Is it found in GCP somwhere? Or can I use kubectl?
Kindly check your Audit Logs and run the following query:
resource.type="gke_cluster" AND
log_id("cloudaudit.googleapis.com/activity")
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
Hope it helps

Dataproc custom image: Cannot complete creation

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

GCP VM runs container image even after the image was deleted from container registry

I was running a container on GCP VM, and it ran the latest image without an issue.
However recently I found the container it ran was not the latest anymore, even the only image in the registry is the latest version I pushed there.
I tested it by deleting the image from the registry, and running VM without changing the container image name in the VM setting. Not surprisingly, the VM still runs the old container.
As I can't think of any reason for this, could someone give me a hint?
Thanks!
From your description I assume that you're running a single VM with Container Optimized OS (COS).
Container-Optimized OS images have the built-in capability to automatically upgrade to a newer version when released. This capability, when enabled, allows user instances to stay up-to-date with respect to security fixes and bug fixes.
So - when your image was in the Container registry it was the same version as the one running on your VM. When you deleted it the auto-update feature didn't have anything to compare to so it ran your old (but still tha latest) image.
If you push newer version to the registry your image should be updated automatically.
However this feature may be disabled, here's how:
gcloud compute instances add-metadata --metadata cos-update-strategy=update_disabled
So - you can check your VM's metadata to figure out the status of the feature.
There's also one exception when the auto-update will not work:
Users running standalone Container-Optimized OS with any of the affected versions, and having the auto-update feature enabled, will not see their instances being updated to newer versions. In these cases, users should manually choose newer OS versions by recreating their VM instances with the newer image. Automatic updates will continue to work on all supported milestones for new releases.
These images cannot be updated to latest versions:
On Milestone 77: images prior to cos-77-12371-1000-0
On Milestone 81: images prior to cos-81-12871-1000-0
On Milestone 85: images prior to cos-85-13310-1000-0
On Milestone 86: images prior to cos-dev-86-15053-0-0
These images will no longer receive any updates:
All milestones before 77, including any previously deprecated milestones.

Apache airflow vs puckle airflow image

I am currently using puckel airflow but now the apache airflow image is also available.
Which one is better and reliable. And given I need to start from scratch, which option would be better?
Official. Puckel is no longer updated
I started with puckel airlfow but as Javier mentioned its no longer updated. Last updated version was 1.10.9. Its easier to start with this image and following updates and mimicking required behaviours from official docker image you can build on it.

How often do Cloud Build Node.js versions update?

I couldn't stomach purchasing the $150 for GCP's support service for this one question. I'm just looking to understand the schedule for Cloud Build Node.js versions. It's still stuck on Node.js v10.10 and my projects are starting to require higher versions to build. According to Cloud Build's changelog, I don't believe the Node.js version has updated in years. Any ideas?
As per the official Github repository:
Our client libraries follow the Node.js release schedule. Libraries are compatible with all current active and maintenance versions of Node.js.
So, this means it should work with Node.js 12 and the updates should be more constant. In addition to that, in here, it says that if you are using a Cloud Build config file, you can use Node.js 12, so the Node.js' latest version should be compatible with Cloud Build.
To summarize, by the repository, it should follow Node.js schedule. However, in case you think this is not occurring, I would recommend you to raise a bug on the Google's Issue Tracker - it's free, by the way - so they can assess this.