I need to select a custom profile ([Configuring your pipeline to use the custom profile] https://cloud.google.com/data-fusion/docs/how-to/running-against-existing-dataproc) for Running a pipeline against an existing Dataproc cluster. According to the type of instance I have (DEVELOPER), higher than version 6.3 the Compute config option should be enabled:
Any idea why I can't see it?
Comparison of Developer, Basic, and Enterprise editions
I’ve configured an Exiting Dataproc profile:
Compute profile
But when I need select this new profile in the “Compute config” option I can't see it even though the documentation indicates that in versions higher than 6.3 it should be enabled
option disabled on my instance
My Data Fusion instance:
My instance created
If I understand the problem correctly, you are able to create the profile but not use in the pipelines? From the screenshot, looks like you are looking at wrong place. Here is where it can be set.
https://cloud.google.com/data-fusion/docs/how-to/running-against-existing-dataproc#configuring_your_pipeline_to_use_the_custom_profile_2
Related
For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you
I'm using kube-aws to run a Kubernetes cluster on AWS, and everything works as expected.
Now, I realize that cron jobs aren't turned on in the version I'm using (v1.7.10_coreos.0), while the documentation for Kubernetes only states the following:
For previous versions of cluster (< 1.8) you need to explicitly enable batch/v2alpha1 API by passing --runtime-config=batch/v2alpha1=true to the API server (see Turn on or off an API version for your cluster for more).
And the documentation directed to in that text only states this (it's the actual, full documentation):
Specific API versions can be turned on or off by passing --runtime-config=api/ flag while bringing up the API server. For example: to turn off v1 API, pass --runtime-config=api/v1=false. runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively. For example, for turning off all API versions except v1, pass --runtime-config=api/all=false,api/v1=true. For the purposes of these flags, legacy APIs are those APIs which have been explicitly deprecated (e.g. v1beta3).
I have been unsuccessful in finding information about how to change the configuration of a running cluster, and I, of course, don't want to try to re-run the command on api-server.
Note that kube-aws still use hyperkube, and not kubeadm. Also, the /etc/kubernetes/manifests-directory only contains the ssl-directory.
The setting I want to apply is this: --runtime-config=batch/v2alpha1=true
What is the proper way, preferably using kubectl, to apply this setting and have the apiservers restarted?
Thanks.
batch/v2alpha1=true is set by default in kube-aws. You can find it here
I'm following along with the instructions on Cloudera's website to set up a cluster using Cloudera Director. However, when I get to the step where I'm supposed to "Add an Environment," I'm presented with two issues. First, the region I selected (us-east1-b) when configuring my Google Compute instance is not available for selection on the Cloudera Director software. Second, there is no option for me to upload Client ID JSON Keys, as the documentation says we should be able to do. I've attached a screenshot of what I'm looking at. Any clues?
My Cloudera director software is reporting itself as version 2.1.1, and the docs I'm looking at are for version 2.1.x. Am I somehow working with an older version of the software? Or are the Cloudera docs not in line with the current version? Can anyone else running Cloudera 2.1.1 confirm that they're seeing something similar or different?
There is a field to load the Client ID JSON keys in the "Advanced Options" section under General Information. Click the > to expand the Advanced Options.
You should be able to type in the region you want even if it isn't provided as a value in the drop-down.
I have a large Google Cloud SQL (Second Gen) instance, and I would like to upgrade my database version from MySQL 5.6 to 5.7. But database version option is disabled on the edit instance form.
Why it is disabled, do i have to create a new instance then export and import existing database? My database is too large, and it will be a long downtime.
Per the Cloud SQL Migration docs, the only way to migrate versions is to export your data, and re-import into a new instance. The documentation mentions going from 5.5 to 5.6, but I would believe that going 5.6 to 5.7 would follow the same procedure.
Minor version upgrades to MySQL through Google Cloud now appears to be supported, though only through API calls:
https://cloud.google.com/sql/docs/mysql/upgrade-minor-db-version#gcloud
The pertinent section if using the gcloud CLI:
gcloud sql instances patch $INSTANCE_NAME --database-version=$DATABASE_VERSION
Substitute your instance name for the $INSTANCE_NAME variable, and your target database version in place of $DATABASE_VERSION.
There appear to be two REST APIs also available - see the documentation for details.
I'm trying to use CfnCluster 1.2.1 for GPU computing and I'm using a custom AMI based on the Ubuntu 14.04 CfnCluster AMI.
Everything is created correctly in the CloudFormation console, although when I submit a new test task to Oracle Grid Engine using qsub from the Master Server, it never gets executed from the queue according to qstat. It stays always in status "qw" and never enters state "r".
It seems to work fine with the Amazon Linux AMI (using user ec2-user instead of ubuntu) and the exact same configuration. Also, the master instance announces the number of remaining tasks to the cluster as a metric, and new compute instances are auto-scaled as a result.
What mechanisms does CfnCluster or Oracle Grid Engine provide to further debug this? I took a look at the log files, but didn't find anything relevant. What could be the cause for this behavior?
Thank you,
Diego
Similar to https://stackoverflow.com/a/37324418/704265
From your qhost output, it looks like your machine "ip-10-0-0-47" is properly configured in SGE. However, on "ip-10-0-0-47" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "ip-10-0-0-47".
I think I found the solution. It seems to be the same issue as the one described in https://github.com/awslabs/cfncluster/issues/86#issuecomment-196966385
I fixed it by adding the following line to the CfnCluster configuration file:
base_os = ubuntu1404
If a custom_ami is specified but no base_os is specified, it defaults to use the Amazon Linux, which uses a different method to configure SGE. There may be problems in the SGE configuration performed by CfnCluster if base_os and custom_ami os are different.