Terraform fails in GitLab due to cache - amazon-web-services

I am trying to deploy my AWS infrastructure using Terraform from GitLab CI CD Pipeline.
I am using the GitLab managed image and it's default Terraform template.
I have configured S3 backend and it's pointing to the S3 bucket used to store the tf state file.
I had stored CI CD variables in GitLab for: AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and 'S3_BUCKET'.
Everything was working fine, until I changed the 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and 'S3_BUCKET' which was pointing to a different AWS account.
Now I am getting the following error:
$ terraform init
Initializing the backend...
Backend configuration changed!
Terraform has detected that the configuration specified for the backend
has changed. Terraform will now check for existing state in the backends.
Error: Error loading state:
AccessDenied: Access Denied
status code: 403, request id: XXXXXXXXXXXX,
host id: XXXXXXXXXXXXXXXXXXXXX
Terraform failed to load the default state from the "s3" backend.
State migration cannot occur unless the state can be loaded. Backend
modification and state migration has been aborted. The state in both the
source and the destination remain unmodified. Please resolve the
above error and try again.
Cleaning up file based variables 00:00
ERROR: Job failed: exit code 1
Since this issue happened because I changed the access_key and secret_key (It was working fine from my local VS Code), I commented out the 'cache:' block in the .gitlab-ci.yml file and it worked!
The following is my .gitlab-ci.yml file:
.gitlab-ci.yml
stages:
- validate
- plan
- apply
- destroy
image:
name: registry.gitlab.com/gitlab-org/gitlab-build-images:terraform
entrypoint:
- '/usr/bin/env'
- 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
# Default output file for Terraform plan
variables:
PLAN: plan.tfplan
JSON_PLAN_FILE: tfplan.json
STATE: dbrest.tfstate
cache:
paths:
- .terraform
before_script:
- alias convert_report="jq -r '([.resource_changes[]?.change.actions?]|flatten)|{\"create \":(map(select(.==\"create\"))|length),\"update\":(map(select(.==\"update\"))|length),\"delete\":(map(select(.==\"delete\"))|length)}'"
- terraform --version
- terraform init
validate:
stage: validate
script:
- terraform validate
only:
- tags
plan:
stage: plan
script:
- terraform plan -out=plan_file
- terraform show --json plan_file > plan.json
artifacts:
paths:
- plan.json
expire_in: 2 weeks
when: on_success
reports:
terraform: plan.json
only:
- tags
allow_failure: true
apply:
stage: apply
extends: plan
environment:
name: production
script:
- terraform apply --auto-approve
dependencies:
- plan
only:
- tags
when: manual
terraform destroy:
extends: apply
stage: destroy
script:
- terraform destroy --auto-approve
needs: ["plan","apply"]
when: manual
only:
- tags
The issue clearly happens if I don't comment out the below block. However it used to work before I made changes to the AWS access_key and secret_key.
#cache:
# paths:
# - .terraform
When the cache was not commented, the following was the result in the CI CI Pipeline:
Is cache being stored anywhere? And How do I clear it?
Think it's related to GitLab.

It seems that runner cache can be cleared off from GitLab from the UI itself.
Go to GitLab -> CI CD -> Pipelines and hit the 'Clear Runner Cache' button to clear the cache.
It actually works!

Related

How to setup terraform cicd with gcp and github actions in a multidirectory repository

Introduction
I have a repository with all the infrastructure defined using IaC, separated in folders. For instance, all terraform configuration is in /terraform/. I want to apply all terraform files inside that directory from the CI/CD.
Configuration
The used github action is shown below:
name: 'Terraform'
on: [push]
permissions:
contents: read
jobs:
terraform:
name: 'Terraform'
runs-on: ubuntu-latest
environment: production
# Use the Bash shell regardless whether the GitHub Actions runner is ubuntu-latest, macos-latest, or windows-latest
defaults:
run:
shell: bash
#working-directory: terraform
steps:
# Checkout the repository to the GitHub Actions runner
- name: Checkout
uses: actions/checkout#v3
# Install the latest version of Terraform CLI and configure the Terraform CLI configuration file with a Terraform Cloud user API token
- name: Setup Terraform
uses: hashicorp/setup-terraform#v1
- id: 'auth'
uses: 'google-github-actions/auth#v1'
with:
credentials_json: '${{ secrets.GCP_CREDENTIALS }}'
- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud#v1'
# Initialize a new or existing Terraform working directory by creating initial files, loading any remote state, downloading modules, etc.
- name: Terraform Init
run: terraform init
# Checks that all Terraform configuration files adhere to a canonical format
- name: Terraform Format
run: terraform fmt -check
# On push to "master", build or change infrastructure according to Terraform configuration files
# Note: It is recommended to set up a required "strict" status check in your repository for "Terraform Cloud". See the documentation on "strict" required status checks for more information: https://help.github.com/en/github/administering-a-repository/types-of-required-status-checks
- name: Terraform Apply
run: terraform apply -auto-approve -input=false
Problem
If I log in and then change directory to apply terraform it doesn't find to log in.
storage.NewClient() failed: dialing: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
On the other hand, if I don't change the directory then it doesn't find the configuration files as expected.
Error: No configuration files
Tried to move the terraform configuration files to the root of the repository and works. How could I implement it in a multidirectory repository?
Such feature was requested before. As explained in the issue, auth files is named as follows gha-creds-*.json.
Therefore, added a step just before using terraform to update the variable environment and moving the file itself:
- name: 'Setup google auth in multidirectory repo'
run: |
echo "GOOGLE_APPLICATION_CREDENTIALS=$GITHUB_WORKSPACE/terraform/`ls -1 $GOOGLE_APPLICATION_CREDENTIALS | xargs basename`" >> $GITHUB_ENV
mv $GITHUB_WORKSPACE/gha-creds-*.json $GITHUB_WORKSPACE/terraform/

Local packages not loading to GCP python functions with github actions

I am trying to deploy a GCP function. My code uses a package that's on a private repository. I create a local copy of that package in the folder, and then use gcloud function deploy from the folder to deploy the function.
This works well. I can see a function that is deployed, with the localpackage.
The problem is with using github actions to deploy the function.
The function is part of a repository that has multiple functions, so when I deploy, I run github actions from outside this folder of the function, and while the function gets deployed, the dependencies do not get picked up.
For example, this is my folder structure:
my_repo
- .github/
- workflows/
-function_deploy.yaml
- function_1_folder
- main.py
- requirements.txt
- .gcloudignore
- localpackages --> These are the packages I need uploaded to GCP
My function_deploy.yaml looks like :
name: Build and Deploy to GCP functions
on:
push:
paths:
function_1_folder/**.py
env:
PROJECT_ID: <project_id>
jobs:
job_id:
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: 'actions/checkout#v3'
- id: 'auth'
uses: 'google-github-actions/auth#v0'
with:
credentials_json: <credentials>
- id: 'deploy'
uses: 'google-github-actions/deploy-cloud-functions#v0'
with:
name: <function_name>
runtime: 'python38'
region: <region>
event_trigger_resource: <trigger_resource>
entry_point: 'main'
event_trigger_type: <pubsub>
memory_mb: <size>
source_dir: function_1_folder/
The google function does get deployed, but it fails with:
google-github-actions/deploy-cloud-functions failed with: operation failed: Function failed on loading user code. This is likely due to a bug in the user code. Error message: please examine your function logs to see the error cause...
When I look at the google function, I see that the localpackages folder hasn't been uploaded to GCP.
When I deploy from my local machine however, it does upload the localpackages.
Any suggestions on what I maybe doing incorrectly? And how to upload the localpackages?
I looked at this question:
Github action deploy-cloud-functions not building in dependencies?
But didn't quite understand what was done here.

Terrafrom Init is failing to locate module when providing relative path

I've started building an infrastructure using terraform. Within that TF configuration I am calling a module to be used by using relative path. This is successful in a classic release but I have been tasked with converting the pipeline to yaml. When I run terraform init step the agent finds the Tf config files but can't find the modules folder even though the artifact was downloaded on a previous task.
yaml file:
trigger:
- master
resources:
pipelines:
- pipeline: Dashboard-infra
project: Infrastructure
source: IT Dashboard
- pipeline: Infra-modules
project: Infrastructure
source: AWS Modules
trigger: true
stages:
- stage: Test
displayName: Test
variables:
- group: "Non-Prod Keys"
jobs:
- deployment:
displayName: string
variables:
region: us-east-1
app_name: it-dashboard
environment: test
tf.path: 'IT Dashboard'
pool:
vmImage: 'ubuntu-latest'
environment: test
strategy:
runOnce:
deploy:
steps:
- task: DownloadBuildArtifacts#1
inputs:
buildType: 'specific'
project: '23e9505e-a627-4681-9598-2bd8b6c1204c'
pipeline: '547'
buildVersionToDownload: 'latest'
downloadType: 'single'
artifactName: 'drop'
downloadPath: '$(Agent.BuildDirectory)/s'
- task: DownloadBuildArtifacts#1
inputs:
buildType: 'specific'
project: '23e9505e-a627-4681-9598-2bd8b6c1204c'
pipeline: '88'
buildVersionToDownload: 'latest'
downloadType: 'single'
artifactName: 'Modules'
downloadPath: '$(agent.builddirectory)/s'
- task: ExtractFiles#1
inputs:
archiveFilePatterns: 'drop/infrastructure.zip'
destinationFolder: '$(System.DefaultWorkingDirectory)'
cleanDestinationFolder: false
overwriteExistingFiles: false
- task: ExtractFiles#1
inputs:
archiveFilePatterns: 'Modules/drop.zip'
destinationFolder: '$(System.DefaultWorkingDirectory)'
cleanDestinationFolder: false
overwriteExistingFiles: false
- task: TerraformInstaller#0
inputs:
terraformVersion: '0.12.3'
- task: TerraformTaskV2#2
inputs:
provider: 'aws'
command: 'init'
workingDirectory: '$(System.DefaultWorkingDirectory)/$(tf.path)'
commandOptions: '-var "region=$(region)" -var "app_name=$(app.name)" -var "environment=$(environment)"'
backendServiceAWS: 'tf_nonprod'
backendAWSBucketName: 'wdrx-deployments'
backendAWSKey: '$(environment)/$(app.name)/infrastructure/$(region).tfstate'
Raw error log:
2021-10-29T12:30:16.5973748Z ##[section]Starting: TerraformTaskV2
2021-10-29T12:30:16.5981535Z ==============================================================================
2021-10-29T12:30:16.5981842Z Task : Terraform
2021-10-29T12:30:16.5982217Z Description : Execute terraform commands to manage resources on AzureRM, Amazon Web Services(AWS) and Google Cloud Platform(GCP)
2021-10-29T12:30:16.5982555Z Version : 2.188.1
2021-10-29T12:30:16.5982791Z Author : Microsoft Corporation
2021-10-29T12:30:16.5983122Z Help : [Learn more about this task](https://aka.ms/AA5j5pf)
2021-10-29T12:30:16.5983461Z ==============================================================================
2021-10-29T12:30:16.7253372Z [command]/opt/hostedtoolcache/terraform/0.12.3/x64/terraform init -var region=*** -var app_name=$(app.name) -var environment=test -backend-config=bucket=wdrx-deployments -backend-config=key=test/$(app.name)/infrastructure/***.tfstate -backend-config=region=*** -backend-config=access_key=*** -backend-config=secret_key=***
2021-10-29T12:30:16.7532941Z [0m[1mInitializing modules...[0m
2021-10-29T12:30:16.7558115Z - S3-env in ../Modules/S3
2021-10-29T12:30:16.7578267Z - S3-env.Global-Vars in ../Modules/Global-Vars
2021-10-29T12:30:16.7585434Z - global-vars in
2021-10-29T12:30:16.7589958Z [31m
2021-10-29T12:30:16.7597321Z [1m[31mError: [0m[0m[1mUnreadable module directory[0m
2021-10-29T12:30:16.7597847Z
2021-10-29T12:30:16.7599087Z [0mUnable to evaluate directory symlink: lstat ../Modules/global-vars: no such
2021-10-29T12:30:16.7599550Z file or directory
2021-10-29T12:30:16.7599933Z [0m[0m
2021-10-29T12:30:16.7600324Z [31m
2021-10-29T12:30:16.7600779Z [1m[31mError: [0m[0m[1mFailed to read module directory[0m
2021-10-29T12:30:16.7600986Z
2021-10-29T12:30:16.7601405Z [0mModule directory does not exist or cannot be read.
2021-10-29T12:30:16.7601808Z [0m[0m
2021-10-29T12:30:16.7602135Z [31m
2021-10-29T12:30:16.7602573Z [1m[31mError: [0m[0m[1mUnreadable module directory[0m
2021-10-29T12:30:16.7602768Z
2021-10-29T12:30:16.7603271Z [0mUnable to evaluate directory symlink: lstat ../Modules/global-vars: no such
2021-10-29T12:30:16.7603636Z file or directory
2021-10-29T12:30:16.7603964Z [0m[0m
2021-10-29T12:30:16.7604291Z [31m
2021-10-29T12:30:16.7604749Z [1m[31mError: [0m[0m[1mFailed to read module directory[0m
2021-10-29T12:30:16.7604936Z
2021-10-29T12:30:16.7605370Z [0mModule directory does not exist or cannot be read.
2021-10-29T12:30:16.7605770Z [0m[0m
2021-10-29T12:30:16.7743995Z ##[error]Error: The process '/opt/hostedtoolcache/terraform/0.12.3/x64/terraform' failed with exit code 1
2021-10-29T12:30:16.7756780Z ##[section]Finishing: TerraformTaskV2
I have attempted to even move the modules folder inside the tf.path so it is within the same folder as the tf config files and changed the location from "../" to "./". No matter what repo I extract the modules folder to (after downloading as artifact from another build pipeline) it cannot be found when calling it on the tf config files. I am fairly new to DevOps and would appreciate any help or just being pointed in the right direction.
Define system.debug: true variable at global level to enable debug logs - maybe something there will give you a hint:
variables:
system.debug: true
Apart from downloaded artifacts, do you expect to have files checked out from the repo the pipeline is defined in? The deployment job doesn't checkout git files by default, so you may want to add checkout: self to steps there.
Unable to evaluate directory symlink: lstat ../Modules/global-vars - this is suspicious, I wouldn't expect any symlinks in there. But maybe the error message is just misleading.
A useful trick is to log the whole directory structure.
You can do this with a bash script step (might need to apt install tree first):
- script: tree
Or with powershell (will work on MS-hosted linux agent):
- pwsh: Get-ChildItem -Path '$(agent.builddirectory)' -recurse

Gitlab Cloud run deploy successfully but Job failed

Im having an issue with my CI/CD pipeline ,
its successfully deployed to GCP cloud run but on Gitlab dashboard the status is failed.
I tried to replace images to some other docker images but it fails as well .
# File: .gitlab-ci.yml
image: google/cloud-sdk:alpine
deploy_int:
stage: deploy
environment: integration
only:
- integration # This pipeline stage will run on this branch alone
script:
- echo $GCP_SERVICE_KEY > gcloud-service-key.json # Google Cloud service accounts
- gcloud auth activate-service-account --key-file gcloud-service-key.json
- gcloud config set project $GCP_PROJECT_ID
- gcloud builds submit . --config=cloudbuild_int.yaml
# File: cloudbuild_int.yaml
steps:
# build the container image
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build','--build-arg','APP_ENV=int' , '-t', 'gcr.io/$PROJECT_ID/tpdropd-int-front', '.' ]
# push the container image
- name: 'gcr.io/cloud-builders/docker'
args: [ 'push', 'gcr.io/$PROJECT_ID/tpdropd-int-front']
# deploy to Cloud Run
- name: "gcr.io/cloud-builders/gcloud"
args: ['run', 'deploy', 'tpd-front', '--image', 'gcr.io/$PROJECT_ID/tpdropd-int-front', '--region', 'us-central1', '--platform', 'managed', '--allow-unauthenticated']
gitlab build output :
ERROR: (gcloud.builds.submit)
The build is running, and logs are being written to the default logs bucket.
This tool can only stream logs if you are Viewer/Owner of the project and, if applicable, allowed by your VPC-SC security policy.
The default logs bucket is always outside any VPC-SC security perimeter.
If you want your logs saved inside your VPC-SC perimeter, use your own bucket.
See https://cloud.google.com/build/docs/securing-builds/store-manage-build-logs.
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
I fix it by using:
options:
logging: CLOUD_LOGGING_ONLY
in cloudbuild.yaml
there you can use this work around :
Fix it by giving the Viewer role to the service account running this but this feels like giving too much permission to such a role.
This worked for me: Use --suppress-logs
gcloud builds submit --suppress-logs --tag=<my-tag>
To fix the issue, you just need to create a bucket in your project (by default - without public access) and add the role 'Store Admin' to your user or service account via https://console.cloud.google.com/iam-admin/iam
After that, you can refer the new bucket into the gcloud builds submit via parameter --gcs-log-dir gs://YOUR_NEW_BUCKET_NAME_HERE like this:
gcloud builds submit --gcs-log-dir gs://YOUR_NEW_BUCKET_NAME_HERE ...(other parameters here)
We need a new bucket because the default bucket for logs is a global (cross-projects). That's why it has specific security requirements to access it especially from outside the Google Cloud, like GitLab, Azure DevOps ant etc via service accounts.
(Moreover, in this case you no need to turn off logging via --suppress-logs)
Kevin's answer worked like a magic for me, since I am not able to comment, I am writing this new answer.
Initially I was facing the same issue where inspite of gcloud build submit command passed , my gitlab CI was failing.
Below is the cloudbuild.yaml file where I add the option logging as Kevin suggested.
steps:
name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args: ['run_query.sh', '${_SCRIPT_NAME}']
options:
logging: CLOUD_LOGGING_ONLY
Check this document for details: https://cloud.google.com/build/docs/build-config-file-schema#options
To me worked the options solution as mentioned for #Kevin. Just add the parameter as mentioned before in the cloudbuild.yml file.
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/myproject/myimage', '.']
options:
logging: CLOUD_LOGGING_ONLY

Terraform and Gitlab CI pipeline for multiple projects

We have an existing Gitlab CI EE pipeline for Terraform that works, one env at a time, each of them in a different AWS account.
However, we want to be able to scale pipeline for different teams which uses TF for their IaC requirements.
Is there a way to do it? different IaC repo/branch per team with same pipeline to handle TF deployments, with S3 as backend?
Any other way to do it? We have Gitlab CI EE and Terraform opensource edition.
I did something similar.
I had multiple repositories for each terraform module and a main repo that calls each module You can check how to do it here, in this main repository I had something like this:
- dev/
- main.tf
- variables.tf
- dev.tfvars
- backend.tf
- test/
- prd/
- .gitlab-ci.yml
Each environment had its own folder/files with the s3 and variables necessary to run. The main file is where all the other repos are called. The pipeline is triggered when it detects changes in the paths and it is initialized on the environment depending on the commit branch:
include:
- template: Terraform/Base.latest.gitlab-ci.yml
before_script:
- echo "${CI_COMMIT_BRANCH}"
- mkdir -p ~/.ssh
- touch ~/.ssh/known_hosts
- echo "$SSH_PUBLIC_KEY" > ~/.ssh/id_rsa.pub #this is to allow the main repository to get the information from the other ones, all of them are private
- echo "$SSH_PRIVATE_KEY" > ~/.ssh/id_rsa
- echo "$KNOWN_HOSTS" > ~/.ssh/known_hosts
- terraform --version
stages:
- init
init:
stage: init
environment: $CI_COMMIT_BRANCH
script:
- terraform init # just an example
rules:
- if: $CI_COMMIT_BRANCH == "dev" || $CI_COMMIT_BRANCH == "tst" || $CI_COMMIT_BRANCH == "prd"
changes:
- dev/*
- tst/*
- prd/*
I must say this is not the best way to do it, there are some "security" points to mention but they can be solved with a little ingenuity such as: I have understood that the backend.tf shouldn't be explicit in each folder neither the .tfvars file. Someone told me that using terraform enterprise these issues could be solved. Another "dirty thing" about this is that there's kind of duplicated code because each environment folder contains the same main file, outputs and variables.
So far, the way I did it works. I hope this can give you an idea :)
Good luck.