Dataproc custom image: Cannot complete creation

Dataproc custom image: Cannot complete creation - google-cloud-platform

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

Related

Application information missing in Spinnaker after re-adding GKE accounts - using spinnaker-for-gke

I am using a Spinnaker implementation set up on GCP using the spinnaker-for-gcp tools. My initial setup worked fine. However, we recently had to re-configure our GKE clusters (independently of Spinnaker). Consequently I deleted and re-added our gke-accounts. After doing that the Spinnaker UI appears to show the existing GKE-based applications but if I click on any of them there are no clusters or load balancers listed anymore! Here are the spinnaker-for-gcp commands that I executed:
$ hal config provider kubernetes account delete company-prod-acct
$ hal config provider kubernetes account delete company-dev-acct
$ ./add_gke_account.sh # for gke_company_us-central1_company-prod
$ ./add_gke_account.sh # for gke_company_us-west1-a_company-dev
$ ./push_and_apply.sh
When the above didn't work I did an experiment where I deleted the two account and added an account with a different name (but the same GKE cluster) and ran push_and_apply. As before, the output messages seem to indicate that everything worked, but the Spinnaker UI continued to show all the old account names, despite the fact that I deleted them and added new ones (which did not show up). And, as before, not details could be seen for any of the applications. Also note that hal config provider kubernetes account list did show the new account name and did not show the old ones.
Any ideas for what I can do, other than complete recreating our Spinnaker installation? Is there anything in particular that I should look for in the Spinnaker logs in GCP to provide more information?
Thanks in advance.
-Mark

The problem turned out to be that the data that was in my .kube/config file in Cloud Shell was obsolete. Removing that file, recreating it (via the appropriate kubectl commands) and then running the commands mentioned in my original description fixed the problem.
Note, though, that it took a lot of shell script and GCP log reading by our team to figure out the problem. Ultimately, what would have been nice would have been if the add_gke_account.sh or push_and_apply.sh scripts could have detected the issue, presumably by verifying that the expected changes did, in fact, correctly occur in the running spinnaker.

`docker compose create ecs` without user input

I am looking for a way to run docker compose create ecs without having to manually select where it gets AWS credentials from (as it's being run from a build agent).
In the following AWS blog it shows it being used with a flag --from-env (which is exactly what I want), however that flag doesn't seem to actually exist, either in the official docs, or by trial and error. Is there something I am missing?

Apparently it's a known issue
https://github.com/docker/docker.github.io/issues/11845
You have to enable experimental support for the docker cli in Linux to create an ecs context :S

GCP Deployment Manager - What Dev Ops Tool To Use In Conjunction?

I'm presently looking into GCP's Deployment Manager to deploy new projects, VMs and Cloud Storage buckets.
We need a web front end that authenticated users can connect to in order to deploy the required infrastructure, though I'm not sure what Dev Ops tools are recommended to work with this system. We have an instance of Jenkins and Octopus Deploy, though I see on Google's Configuration Management page (https://cloud.google.com/solutions/configuration-management) they suggest other tools like Ansible, Chef, Puppet and Saltstack.
I'm supposing that through one of these I can update something simple like a name variable in the config.yaml file and deploy a project.
Could I also ensure a chosen name for a project, VM or Cloud Storage bucket fits with a specific naming convention with one of these systems?
Which system do others use and why?

I use Deployment Manager, as all 3rd party tools are reliant upon the presence of GCP APIs, as well as trusting that those APIs are in line with the actual functionality of the underlying GCP tech.
GCP is decidedly behind the curve on API development, which means that even if you wanted to use TF or whatever, at some point you're going to be stuck inside the SDK, anyway. So that's why I went with Deployment Manager, as much as I wanted to have my whole infra/app deployment use other tools that I was more comfortable with.
To specifically answer your question about validating naming schema, what you would probably want to do is write a wrapper script that uses the gcloud deployment-manager subcommand. Do your validation in the wrapper script, then run the gcloud deployment-manager stuff.
Word of warning about Deployment Manager: it makes troubleshooting very difficult. Very often it will obscure the error that can help you actually establish the root cause of a problem. I can't tell you how many times somebody in my office has shouted "UGGH! Shut UP with your Error 400!" I hope that Google takes note from my pointed survey feedback and refactors DM to pass the original error through.
Anyway, hope this helps. GCP has come a long way, but they've still got work to do.

Command to create google cloud backend service fails - what am I doing wrong?

I am currently working through the Google Cloud "load balancing" code lab:
https://codelabs.developers.google.com/codelabs/cpo200-load-balancing
On page 4 of the lab, it requires me to run the following command in the Cloud Shell, to create a backend-service (for load balancing of a group of web server, i.e. HTTP, instances):
gcloud compute backend-services create \
guestbook-backend-service \
--http-health-checks guestbook-health-check
However, running this command results in the following error:
ERROR: (gcloud.compute.backend-services.create) Some requests did not succeed:
- Invalid value for field 'resource.loadBalancingScheme': 'EXTERNAL'.
Backend Service based Network Load Balancing is not yet supported.
Assuming that all the preceding steps in the code lab are correct (which I have no reason to suspect is not the case), this appears to be a bug in the code lab.
I have submitted a bug report for this, however, since I am not expecting any response to the bug report any time soon but I do want to continue on with this lab, what command should I be running instead?
I presume there has been some sort of API change but the code lab has not caught up and the documentation does not appear to indicate any relevant changes.
I realize I could probably work out how to do this with the Cloud Console, but I would really like to learn the command line actions.
Does anyone have any ideas?
Thanks in advance!

And, as is the nature of these things, shortly after I post this I discover the answer for myself...
The command should be:
gcloud compute backend-services create \
guestbook-backend-service \
--http-health-checks guestbook-health-check \
--global
It appears that what the error message is actually complaining about is that regional backend-services are not supported; they must be global.
Leaving aside the fact that the lab directions are inadequate, it would be nice if this was detailed in the documentation, but I guess we can't have everything...

Tasks not executed by the Compute Nodes in Ubuntu CfnCluster image

I'm trying to use CfnCluster 1.2.1 for GPU computing and I'm using a custom AMI based on the Ubuntu 14.04 CfnCluster AMI.
Everything is created correctly in the CloudFormation console, although when I submit a new test task to Oracle Grid Engine using qsub from the Master Server, it never gets executed from the queue according to qstat. It stays always in status "qw" and never enters state "r".
It seems to work fine with the Amazon Linux AMI (using user ec2-user instead of ubuntu) and the exact same configuration. Also, the master instance announces the number of remaining tasks to the cluster as a metric, and new compute instances are auto-scaled as a result.
What mechanisms does CfnCluster or Oracle Grid Engine provide to further debug this? I took a look at the log files, but didn't find anything relevant. What could be the cause for this behavior?
Thank you,
Diego

Similar to https://stackoverflow.com/a/37324418/704265
From your qhost output, it looks like your machine "ip-10-0-0-47" is properly configured in SGE. However, on "ip-10-0-0-47" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "ip-10-0-0-47".

I think I found the solution. It seems to be the same issue as the one described in https://github.com/awslabs/cfncluster/issues/86#issuecomment-196966385
I fixed it by adding the following line to the CfnCluster configuration file:
base_os = ubuntu1404
If a custom_ami is specified but no base_os is specified, it defaults to use the Amazon Linux, which uses a different method to configure SGE. There may be problems in the SGE configuration performed by CfnCluster if base_os and custom_ami os are different.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js