Kubectl against GKE Cluster through Terraform's local-exec? - google-cloud-platform

I am trying to make an automatic migration of workloads between two node pools in a GKE cluster. I am running Terraform in GitLab pipeline. When new node pool is created the local-exec runs and I want to cordon and drain the old node so that the pods are rescheduled on the new one. I am using this registry.gitlab.com/gitlab-org/terraform-images/releases/1.1:v0.43.0 image for my Gitlab jobs. Also, python3 is installed with apk add as well as gcloud cli - downloading the tar and using the gcloud binary executable from google-cloud-sdk/bin directory.
I am able to use commands like ./google-cloud-sdk/bin/gcloud auth activate-service-account --key-file=<key here>.
The problem is that I am not able to use kubectl against my cluster.
Although I have installed the gke-gcloud-auth-plugin with ./google-cloud-sdk/bin/gcloud components install gke-gcloud-auth-plugin --quiet once in the CI job and second time in the local-exec script in HCL code I get the following errors:
module.create_gke_app_cluster.null_resource.node_pool_provisioner (local-exec): E0112 16:52:04.854219 259 memcache.go:238] couldn't get current server API group list: Get "https://<IP>/api?timeout=32s": getting credentials: exec: executable <hidden>/google-cloud-sdk/bin/gke-gcloud-auth-plugin failed with exit code 1
290module.create_gke_app_cluster.null_resource.node_pool_provisioner (local-exec): Unable to connect to the server: getting credentials: exec: executable <hidden>/google-cloud-sdk/bin/gke-gcloud-auth-plugin failed with exit code 1
When I check the version of the gke-gcloud-auth-plugin with gke-gcloud-auth-plugin --version
I am getting the following error:
174/bin/sh: eval: line 253: gke-gcloud-auth-plugin: not found
Which clearly means that the plugin is not installed.
The image that I am using is based on alpine for which there is no way to install the plugin via package manager, unfortunately.
Edit: gcloud components list shows gke-gcloud-auth-plugin as installed too.

The solution was to use google/cloud-sdk image in which I have installed terraform and used this image for the job in question.

Related

Cloud Run /Cloud Code deployment error in intellij

trying to follow the Getting Started instructions for Deploying a Cloud Run service with Cloud Code in Intellij (deploying HelloWorld Flask app container with Cloud Run: Deploy) but getting the following error, any idea why this might be happening
it worked initially i.e. deployed the app on Cloud Run service using the same steps, and then started throwing this error after a week or so when trying to redeploy, there was no change in project settings.
intellij and docker versions are the latest.
authenticated to google cloud project with gcloud auth login --update-adc
The local run works fine (Cloud Run: Run Locally),
but running the Cloud Run: Deploy throws this "code 89" error
Preparing Google Cloud SDK (this may take several minutes for first time setup)...
Creating skaffold file: /var/.../skaffold8013155926954225609.tmp
Configuring image push settings in /var/.../skaffold8013155926954225609.tmp
../Library/Application Support/cloud-code/bin/versions/../
skaffold build --filename /var/.../skaffold8013155926954225609.tmp --tag latest --skip-tests=true
invalid skaffold config: getting minikube env:
running [/Users/USER/Library/Application Support/google-cloud-tools-java/managed-cloud-sdk/LATEST/google-cloud-sdk/bin/
minikube docker-env --shell none -p minikube --user=skaffold]
- stdout: "false exit code 89"
- stderr: ""
- cause: exit status 89
Failed to build and push Cloud Run container image.
Please ensure your builder settings are correct, network is available, you are logged in to a valid GCP project, and try again.
Edit: I see minikube error code 89: ExGuestUnavailable and it's an error code specific to the guest host, still unclear what might be causing this
Looks like an issue with skaffold attempting to communicate with minikube (which could be used for building images as well). Please try cleaning minikube
minikube stop
minikube delete --all --purge
and try again.
ok, i still don't know why it fails to deploy to cloud run from intellij but i got it to deploy from command line
cd my-flask-app
#step 1: build container image from Dockerfile and submit to container registry
gcloud builds submit --tag gcr.io/GCP_PROJECT_ID/my-flask-app
#step 2: deploy the image on cloud run (reference)
gcloud run deploy --image gcr.io/GCP_PROJECT_ID/my-flask-app
references:
https://cloud.google.com/build/docs/building/build-containers
https://cloud.google.com/container-registry/docs/quickstart
Edit: the answer above did the trick : minikube delete --all --purge

AWS EB docker-compose deployment from private registry access forbidden

I'm trying to get docker-compose deployment to AWS Elastic Beanstalk working, in which the docker images are pulled from a private registry hosted by GitLab.
The strange thing is that initial deployment works perfectly; It pulls the image from the private registry and starts the containers using docker-compose, and the webpage (served by Django) is accessible through the host.
Deploying a new version using the same docker-compose and the same docker image will result in an error while pulling the docker image:
2021/03/16 09:28:34.957094 [ERROR] An error occurred during execution of command [app-deploy] - [Run Docker Container]. Stop running the command. Error: failed to run docker containers: Command /bin/sh -c docker-compose up -d failed with error exit status 1. Stderr:Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Creating network "current_default" with the default driver
Pulling redis (redis:alpine)...
Pulling mysql (mysql:5.7)...
Pulling project.dockertest(registry.gitlab.com/company/spikes/dockertest:latest)...
Get https://registry.gitlab.com/v2/company/spikes/dockertest/manifests/latest: denied: access forbidden
2021/03/16 09:28:34.957104 [INFO] Executing cleanup logic
Setup
AWS Elastic Beanstalk 64bit Amazon Linux 2/3.2
Gitlab registry credentials are stored within a S3 bucket, with the filename .dockercfg and has the following content:
{
"auths": {
"registry.gitlab.com": {
"auth": "base64 encoded username:personal_access_token"
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.03.1-ce (linux)"
}
}
The repository contains a v3 Dockerrun.aws.json file to refer to the credential file in S3:
{
"AWSEBDockerrunVersion": "3",
"Authentication": {
"bucket": "gitlab-dockercfg",
"key": ".dockercfg"
}
}
Reproduce
Setup docker-compose.yml that uses a service with a private docker image (and can be pulled with the credentials setup in the dockercfg within S3)
Create a new applicatoin that uses the docker-platform.
eb init testapplication --platform=docker --region=eu-west-1
Note: region must be the same as the S3 bucket containing the dockercfg.
Initial deployment (this will succeed)
eb create testapplication-test --branch_default --cname testapplication-test --elb-type=application --instance-types=t2.micro --min-instance=1 --max-instances=4
The initial deployment shows that the image is available and can be started:
2021/03/16 08:58:07.533988 [INFO] save docker tag command: docker tag 5812dfe24a4f redis:alpine
2021/03/16 08:58:07.533993 [INFO] save docker tag command: docker tag f8fcde8b9ae2 mysql:5.7
2021/03/16 08:58:07.533998 [INFO] save docker tag command: docker tag 1dd9b65d6a9f registry.gitlab.com/company/spikes/dockertest:latest
2021/03/16 08:58:07.534010 [INFO] Running command /bin/sh -c docker rm `docker ps -aq`
Without changing anything to the local repository and the remote docker image on the private registry, lets do a redeployment which will trigger the error:
eb deploy testapplication-test
This will fail with the following output:
...
2021-03-16 10:02:28 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2021-03-16 10:02:29 ERROR Unsuccessful command execution on instance id(s) 'i-0dc445d118ac14b80'. Aborting the operation.
2021-03-16 10:02:29 ERROR Failed to deploy application.
ERROR: ServiceError - Failed to deploy application.
And logs of the instance show (/var/log/eb-engine.log):
Pulling redis (redis:alpine)...
Pulling mysql (mysql:5.7)...
Pulling project.dockertest (registry.gitlab.com/company/spikes/dockertest:latest)...
Get https://registry.gitlab.com/v2/company/spikes/dockertest/manifests/latest: denied: access forbidden
2021/03/16 10:02:25.902479 [INFO] Executing cleanup logic
Steps I've tried to debug or solve the issue
Rename dockercfg to .dockercfg on S3 (somewhere mentioned on the internet as possible solution)
Use the 'old' docker config format instead of the one generated by docker 1.7+. But later on I figured out that Amazon Linux 2-instances are compatible with the new format together with Dockerrun v3
Having an incorrectly formatted dockercfg on S3 will cause an error deployment regarding the misformatted file (so it actually does something with the dockercfg from S3)
Documentation
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/single-container-docker-configuration.html
I'm out of debug options, and I've no idea where to look any further to debug this problem. Perhaps someone can see what is going wrong here?
First of all, the issue describe above is a bug confirmed by Amazon. To get the deployment working on our side, we've contacted Amazon support.
They've a fix in place which should be released this month, so keep an eye on the changelog of the Elastic beanstalk platform: https://docs.aws.amazon.com/elasticbeanstalk/latest/relnotes/relnotes.html
Although the upcoming release should have the fix, there is a workaround available to get the docker-compose deployment working.
Elastic Beanstalk allows hook to be executed within the deployment, which can be used to fetch the .docker.cfg from a S3 bucket to authenticate with against the private registry.
To do so, create the following file and directories from the root of the project:
File location: .platform/hooks/predeploy/docker_login
#!/bin/bash
aws s3 cp s3://{{bucket_name_to_use}}/.dockercfg ~/.docker/config.json
Important: Add execution rights to this file (for example: chmod +x .platform/hooks/predeploy/docker_login)
To support instance configuration changes, please symlink the hooks directory to confighooks:
ln -s .platform/hooks/ .platform/confighooks/
Updating configuration requires the .dockercfg credentials to be fetched too.
This should enable continuous deployments to the same EB-instance without the authentication errors, because the hook will be execute before the docker image pulling.
Some background:
The docker daemon reads credentials from ~/.docker/config by default on traditional linux systems. On the initial deploy this file will exist on the Elastic Beanstalk instance. On the next deployment this file is removed. Unfortunately, on the next deployment the .dockercfg is not refetched, therefor the docker daemon does not have the correct credentials to authenticate with.
I was dealing the same errors while trying to pull images from a privately hosted GitLab instance. I was able to resolve them by including the email address that was associated with the generated token found in the auth field of the .dockercfg file.
The following file format worked for me:
"registry.gitlab.com" {
"auth": "base64 encoded username:personal_access_token",
"email": "email for personal access token"
}
In my case I used a Project Access Token, which has an e-mail address associated with it once it is created.
The file format in the Elastic Beanstalk documentation for the authentication file here, indicates that this is the required file format, though the versions that it says this format is required for are almost certainly outdated, since we are running Docker ^19.

Permission Error Running Container in AWS CodeBuild

I'm attempting to run the following command in CodeBuild:
- docker run --rm -v $(pwd)/SQL:/flyway/SQL -v $(pwd)/conf:/flyway/conf flyway/flyway -enterprise -url=jdbc:postgresql://xxx.xxx.us-east-1.rds.amazonaws.com:5432/hamshackradio -dryRunOutput="/flyway/SQL/output.sql" migrate
I get the following error:
ERROR: Unable to use /flyway/SQL/output.sql as a dry run output:
/flyway/SQL/output.sql (Permission denied) Caused by:
java.io.FileNotFoundException: /flyway/SQL/output.sql (Permission
denied)
The goal is to capture the output.sql file. Running the exact same command locally on Windows (adjusting the paths of course) works without error. The issue isn't Flyway or the overall command structure. It's something to do with the internals of running the Docker container on Ubuntu on AWS CodeBuild and permissions there (maybe permissions, maybe something else, I'm open).
Does anyone have a good idea on how to address this?
The container doesn't have write access to the host. You could try the following, which saves the artifact to the container and uses docker cp to copy the artifact to the host.
container=$(docker create -v <flywaymigrationspathonhost>:/flyway/sql flyway/flyway migrate -dryRunOutput=/flyway/reports/changes.sql -schemas=dbo -user=youruser -password=yourpass -url=<yourjdbcurl> -licenseKey=<licensekey>)
docker start -a ${container}
docker cp ${container}:/flyway/reports/changes.sql <hostpath>
You need to enable Privileged Mode for your CodeBuild project.

Image permission in jenkins docker agent

I am using normal jenkins installation (NOT THE DOCKER IMAGE) on a normal AWS ec2 instance, with docker engine installed along side jenkins.
I have a simple jenkins pipeline like this:
pipeline {
agent none
stages {
stage('Example Build') {
agent { docker {
image 'cypress/base:latest'
args '--privileged --env CYPRESS_CACHE_FOLDER=~/.cache'
} }
steps {
sh 'ls'
sh 'node --version'
sh 'yarn install'
sh 'make e2e-test'
}
}
}
}
this will make the pipeline fail in the yarn install step while installing cypress although all it's dependenices is satisfied from the cypress image.
ERROR LOG FROM JENKINS
error /var/lib/jenkins/workspace/Devops-Capstone-Project_master/node_modules/cypress: Command failed.
Exit code: 1
Command: node index.js --exec install
Arguments:
Directory: /var/lib/jenkins/workspace/Devops-Capstone-Project_master/node_modules/cypress
Output:
Cypress cannot write to the cache directory due to file permissions
See discussion and possible solutions at
https://github.com/cypress-io/cypress/issues/1281
----------
Failed to access /.cache:
EACCES: permission denied, mkdir '/.cache'
After some investigation i found that although i have provided the environment variable "CYPRESS_CACHE_FOLDER=~/.cache" to override the default location in the root directory, and also provided the "--privileged". it fails because for some reason jenkins and docker is forcing their args and user mapping from the jenkins host.
I have also tried providing "-u 1000:1000" to override the user mapping but it didn't work.
What could possibly be wrong? and any recommendations or work arounds about this issue?
Thanks ,,
I have found a work around by creating a docker file to build the image and pass the jenkins user id and group to it as build arguments, as described here on this thread .
But this is not guaranteed to work on multiple nodes (master->slaves) jenkins installations as the jenkins user id and group may differ.

AWS Elastic Beanstalk - ERROR: No Application Version named 'v0_9_2-76-gf5a4' found

I'm trying to deploy my code to AWS Beanstalk and get this error. I researched that it could be that the number of versions is more than 500, so I deleted a lot of versions. But, I still get this error.
eb deploy
ERROR: No Application Version named 'v0_9_2-76-gf5a4' found.
I also tried
git aws.push
Error: Failed to create the AWS Elastic Beanstalk application version
Edit:
Trying with eb deploy --debug I now get:
Instance: i-2ad238d5 Module: AWSEBAutoScalingGroup ConfigSet: null Command failed on instance. Return code: 1 Output: Error occurred during build: Command hooks failed . Script /opt/elasticbeanstalk/hooks/appdeploy/pre/10_bundle_install.sh failed with returncode 18
ebcli.objects.exceptions.ServiceError: Update environment operation is
complete, but with errors. For more information, see troubleshooting
documentation.
Did you update the file .elasticbeanstalk/config.yml ? It may have a wrong setup.
Make a backup of .elasticbeanstalk/ folder and remove it
Execute eb create
Select the same region you deployed it before. You can check the region on .elasticbeanstalk/config.yml backup
A list with the environments will appear, select the right one
Deploy now
Remove the .elasticbeanstalk/config.yml backup
Check for the .elasticbeanstalk/config.yml file
environment: CORRECT_ENV_NAME
global:
application_name: CORRECT_APP_NAME
In my case, I was doing eb deploy X where X was an environment for a different project.
When I had the error
InvalidParameterValueError: No Application Version named 'app-9f5c-180927_071528' found.
I fixed this by specifying the label I wanted to push up.
eb deploy XXX-env -l XXX.0.0.1
The -l flag is documented AWS EB Deploy Docs
most likely, the deploy is trying an incorrect Elasticbeanstalk Application. it could be because you renamed the application in the AWS console.
so double check you're pointing to the correct elasticbeanstalk Environment and Application. it could be picking out default values from your .elasticbeanstalk/config.yml file.