Application information missing in Spinnaker after re-adding GKE accounts - using spinnaker-for-gke - google-cloud-platform

I am using a Spinnaker implementation set up on GCP using the spinnaker-for-gcp tools. My initial setup worked fine. However, we recently had to re-configure our GKE clusters (independently of Spinnaker). Consequently I deleted and re-added our gke-accounts. After doing that the Spinnaker UI appears to show the existing GKE-based applications but if I click on any of them there are no clusters or load balancers listed anymore! Here are the spinnaker-for-gcp commands that I executed:
$ hal config provider kubernetes account delete company-prod-acct
$ hal config provider kubernetes account delete company-dev-acct
$ ./add_gke_account.sh # for gke_company_us-central1_company-prod
$ ./add_gke_account.sh # for gke_company_us-west1-a_company-dev
$ ./push_and_apply.sh
When the above didn't work I did an experiment where I deleted the two account and added an account with a different name (but the same GKE cluster) and ran push_and_apply. As before, the output messages seem to indicate that everything worked, but the Spinnaker UI continued to show all the old account names, despite the fact that I deleted them and added new ones (which did not show up). And, as before, not details could be seen for any of the applications. Also note that hal config provider kubernetes account list did show the new account name and did not show the old ones.
Any ideas for what I can do, other than complete recreating our Spinnaker installation? Is there anything in particular that I should look for in the Spinnaker logs in GCP to provide more information?
Thanks in advance.
-Mark

The problem turned out to be that the data that was in my .kube/config file in Cloud Shell was obsolete. Removing that file, recreating it (via the appropriate kubectl commands) and then running the commands mentioned in my original description fixed the problem.
Note, though, that it took a lot of shell script and GCP log reading by our team to figure out the problem. Ultimately, what would have been nice would have been if the add_gke_account.sh or push_and_apply.sh scripts could have detected the issue, presumably by verifying that the expected changes did, in fact, correctly occur in the running spinnaker.

Related

Dataproc custom image: Cannot complete creation

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

Can't remove Google Cloud project

I was playing around with terraform to create an infrastructure for a couple of services on GCP. GCP organises all the infra in so called projects. I specified a project_id incrorrectly in terraform files(actually I set project_id to already existing in my GCP, but ptoject name was different). Terraform in plan phase was successful, but after apply it failed. Then I executed terraform destroy, set correct project_id(and name), executed terraform apply again, this time successfully. But when I opened the GCP console I saw that actually 2 projects were created in project list(one with correct name and id and another with some random name: smth like My Project 1234 as name and beaming-light-546562 as id). And now gcloud projects list command shows 3 projects(this random one, correct one and previously existing one).
The problem is that I can't remove that "random" project, neither from gcloud utility nor from gcp console. I get an error
<myuser_mail_address> does not have permission to access projects instance or poject doesn't exist
Also that random project is not linked to my billing account.
How can I remove that "random" project
EDIT
It seems strange that the project with id beaming-light-546562 can't be removed by me(the owner of an account) with reasons that I do not have permissions to do that. Also the name of an id: it is similar to technic docker is using for generating names of running containers. I do not recall that terraform has such a feature. Could it be gcp itself who generates such random names?
I tried to recreate the error i.e, I created a sample project(via console) and deleted the same sample project in cloud shell using this command
gcloud projects delete <project ID> and again tried to delete the same sample project in cloud shell and got this error message:
You can cross verify if the reason listed in the image i.e PROJECT_DELETE_INACTIVE is present in the output of your gcloud projects delete <project ID> command.This means that the project is inactive and the project becomes inactive when it's deleted.
From this document :
The project takes approximately 30-days for complete deletion, At the end of the 30-day period, the project and all its resources are deleted and cannot be recovered.
Edit:
It seems to be a known issue with GCP. Leaving “Google Groups” related to GCP is a fix to this issue. You can track this Public Issue for more information.
You might have been added into a project through a group, so it appears in the project list. However, you have not been granted permission to modify the IAM of that project, so you can't remove the group from the permission list.
As a workaround, you can leave "Google Groups" related to GCP and reload the GCP console webpage so that all your unknown/inaccessible projects will disappear from the projects list. You can find what groups you're a member of, using this Google Groups link.
NOTE : You can leave the groups in order to lose the access, but there could be a situation where your email is added to a single role/permission and you would not be able to remove yourself from the IAM list.

MWAA - environments constantly loading

I'm currently trying to set up an Airflow environment via MWAA. I've gone through the create environment steps twice with both ending at the page listing Airflow environments with a banner saying I was successful. However, for the past 2 days, this environments page has just shown Loading Environments, as shown below. I also see a (0) for the environment number.
So far, I've added 2 interfaces for ECR and VPC for the API and the environment but no luck. Has anyone else run into this issue or have any clue what might be happening? Thanks!
Were you able to find the solution to this issue? I had similar issues when I tried to set up the first-time MWAA on AWS Account.
https://github.com/awslabs/aws-support-tools/tree/master/MWAA
Here's a link to how to verify if all the resources are set up correctly for MWAA. If you run the script mentioned on the repo you should be able to see where the issue lies.

AWS Amplify environment 'dev' not found

I'm working with AWS Amplify, specifically following this tutorial AWS-Hands-On-Tutorial.
I'm getting a build failure when I try to deploy the application.
So far I have tried creating multiple backend environments and connecting them with the frontend, hoping that this would alleviate the issue. The error message leads me to believe that the deploy is not set up to also detect the backend environment, despite that I have it set to do so.
Also, I have tried changing the environment that is set to deploy with the frontend by creating another develop branch to see if that is the issue.
I've had no success with trying any of these, the build continues to fail. I have also tried running the 'amplify env add' command as the error message states. I have not however tried "restoring its definition in your team-provider-info.json" as I'm not sure what that entails and can't find any information on it. Regardless, I would think creating a new environment would solve the potential issues there, and it didn't. Any help is appreciated.
Due to the documentation being out of date, I completed the steps below to resolve this issue:
Under Build Settings > Add package version override for Amplify CLI and leave it as 'latest'
When the tutorial advises to "update your front end branch to point to the backend environment you just created. Under the branch name, choose Edit...", where the tutorial advises to use 'dev' it actually had us setup 'staging', choose that instead.
Lastly, we need to setup a 'Service Role' under General. Select General > Edit > Create New Service Role > Select the default options and save the role, it should have a name of amplifyconsole-backend-role. Once the role is saved, you can go back to General > Edit > Select your role from the dropdown, if it doesn't show by default start typing it in.
After completing these steps, I was able to successfully redeploy my build and get it pushed to prod with authentication working. Hope it helps anyone who is running into this issue on Module 3 of the AWS Amplify Starter Tutorial!

Why is my cloud run deploy hanging at the "Deploying..." step?

Up until today, my deploy process has worked fine. Today when I go to deploy a new revision, I get stuck at the Deploying... text with a spinning indicator, and it says One or more of the referenced revisions does not yet exist or is deleted. I've tried a number of different images and flags -- all the same.
See Viewing the list of revisions for a service, in order to undo whatever you may have done.
Probably you have the wrong project selected, if it does not know any of the revisions.
I know I provided scant information, but just to follow up with an answer: it looks like the issue was that I was deploying a revision, and then immediately trying to tag it using gcloud alpha run services update-traffic <service_name> --set-tags which looks to have caused some sort of race, where it complained that the revision was not yet deployed, and would hang indefinitely. Moving the set-tag into the gcloud alpha run deploy seemed to fix it.