Terraform deployment of Docker Containers to aws ecr - amazon-web-services

I am having issues deploying my docker images to aws ecr as part of a terraform deployment and I am trying to think through the best long term strategy.
At the moment I have a terraform remote backend in s3 and dynamodb on let's call it my master account. I then have dev/test etc environments in separate accounts. The terraform deployment is currently run off my local machine (mac) and uses the aws 'master' account and its credentials which in turn assumes a role in the target deployment account to create the resources as per:
provider "aws" { // tell terraform which SDK it needs to load
alias = "target"
region = var.region
assume_role {
role_arn = "arn:aws:iam::${var.deployment_account}:role/${var.provider_env_deployment_role_name}"
}
}
I am creating a number of ecs services with Fargate deployments. The container images are built in separate repos by GitHub Actions and saved as GitHub packages. These package names and versions are being deployed after the creation of the ecr and service (maybe that's not ideal thinking about it) and this is where the problems arise.
The process is to pull the image from GitHub Packages, retag it and upload to the ecr using multiple executions of a null_resource local-exec. Works fine stand alone but has problems as part of the terraform process. I think the reason is that the other resources use the above provider to get permissions but as null_resource does not accept a provider it cannot get the permissions this way. So I have been passing the aws creds values into the shell. Not convinced this is really secure but that's currently moot as it ain't working either. I get this error:
Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `The specified item already exists in the keychain.``
Part of me thinks this is the wrong approach and that as I migrate to deploying via a Github action I can separate the infrastructure deployment via terraform from what is really the application deployment and just use GitHub secrets to reset the credentials values then run the script.
Alternatively, maybe the keychain thing just goes away and my process will work fine? Secure ??
That's fine for this scenario but it isn't really a generic approach for all my use cases.
I am shortly going to start deploying multiple aws lambda functions with docker containers. Haven't done it before but it looks like the process is going to be: create ecr, deploy container, deploy lambda function. This really implies that the container deployment should integral to the terraform deployment which loops back to my issue with the local-exec??
I found Actions to deploy to ECR which would imply splitting the deployments into multiple files but that seems inelegant and potentially brittle.
Maybe there is a simple solution, but given where I am trying to go with this, what is my best approach?

I know this isn't a complete answer, but you should be pulling your AWS creds from environment variables. I don't really understand if you need credentials for different accounts, but if you do then swap them during the progress of your action. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html . Terraform should pick these up and automatically use them for AWS access.

Instead of those hard coded access key/secret access keys I'd suggest making use of Github/AWS's ability to assume role through temporary credentials with OIDC https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
You'd likely only define one initial role that you'd authenticate into and from there assume into the other accounts you're deploying into.
These the assume role credentials are only good for an hour and do not have the operation overhead of having to rotate them.

As suggested by Kevin Buchs answer...
My primary issue was related to deploying from a mac and the use of the keychain. As this was not on the critical path I went round it and set up a GitHub Action.
The Action loaded environmental variables from GitHub secrets for my 'master' aws account credentials.
AWS_ACCESS_KEY_ID: ${{ secrets.NK_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.NK_AWS_SECRET_ACCESS_KEY }}
I also loaded the target accounts credentials into environmental variables in the same way BUT with the prefix TF_VAR.
TF_VAR_DEVELOP_AWS_ACCESS_KEY_ID: ${{ secrets.DEVELOP_AWS_ACCESS_KEY_ID }}
TF_VAR_DEVELOP_AWS_SECRET_ACCESS_KEY: ${{ secrets.DEVELOP_AWS_SECRET_ACCESS_KEY }}
I then declare terraform variables which will be automatically populated from the environment variables.
variable "DEVELOP_AWS_ACCESS_KEY_ID" {
description = "access key for the dev account"
type = string
}
variable "DEVELOP_AWS_SECRET_ACCESS_KEY" {
description = "secret access key for the dev account"
type = string
}
And when I run a shell script with a local exec:
resource "null_resource" "image-upload-to-importcsv-ecr" {
provisioner "local-exec" {
command = "./ecr-push.sh ${var.DEVELOP_AWS_ACCESS_KEY_ID} ${var.DEVELOP_AWS_SECRET_ACCESS_KEY} "
}
}
Within the script I can then use these arguments to set the credentials eg
AWS_ACCESS=$1
AWS_SECRET=$1
.....
export AWS_ACCESS_KEY_ID=${AWS_ACCESS}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET}
and the script now has credentials to do whatever.

Related

How can I specify Account ID in CDK when I'm authorized with temp credentials to the AWS account?

I have seen related questions for this issue.
But my problem in particular is I am assuming a role to retrieve temporary credenitals. It works just fine; however, when running CDK Bootstrap, I receive: Need to perform AWS calls for account <account_id>, but no credentials have been configured.
I have tried using CDK Deploy as well and I receive: Unable to resolve AWS account to use. It must be either configured when you define your CDK Stack, or through the environment
How can I specify AWS Account enviornment in CDK when using temp credentials or assuming role?
you need to define the account you are attempting to bootstrap too.
for bootstrap you can use it as part of the command - cdk bootstrap aws://ACCOUNT-NUMBER-1/REGION-1 aws://ACCOUNT-NUMBER-2/REGION-2 ... do note that a given account/region only ever has to be bootstrapped ONCE in its lifetime, no matter how many cdk stacks are going to be deployed there.
AND your credentials need to be part of the default profile. If you are assuming credentials through some sort of enterprise script, please check with them that they store it as part of the default profile. If not, run at least aws config in your bash terminal and get your temp assumed credentials in there
For cdk deploy you need to make sure in your app.py that you have the env defined
env_EU = cdk.Environment(account="8373873873", region="eu-west-1")
env_USA = cdk.Environment(account="2383838383", region="us-west-2")
MyFirstStack(app, "first-stack-us", env=env_USA)
MyFirstStack(app, "first-stack-eu", env=env_EU)
i would however recommend against hardcoding your account numbers ;)
the name of credential and config file should be the same
like:
-credentials
[cdk]
aws_access_key_id = xxxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxx
-config
[cdk]
region = "us-east-1"

How can I connect GitHub actions with AWS deployments without using a secret key?

I'd like to be able to use GitHub Actions to be able to deploy resources with AWS, but without using a hard-coded user.
I know that it's possible to create an IAM user with a fixed credential, and that can be exported to GitHub Secrets, but this means if the key ever leaks I have a large problem on my hands, and rotating such keys are challenging if forgotten.
Is there any way that I can enable a password-less authentication flow for deploying code to AWS?
Yes, it is possible now that GitHub have released their Open ID Connector for use with GitHub Actions. You can configure the Open ID Connector as an Identity Provider in AWS, and then use that for an access point to any role(s) that you wish to enable. You can then configure the action to use the credentials acquired for the duration of the job, and when the job is complete, the credentials are automatically revoked.
To set this up in AWS, you need to create an Open Identity Connect Provider using the instructions at AWS or using a Terraform file similar to the following:
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = [
// original value "sigstore",
"sts.amazonaws.com", // Used by aws-actions/configure-aws-credentials
]
thumbprint_list = [
// original value "a031c46782e6e6c662c2c87c76da9aa62ccabd8e",
"6938fd4d98bab03faadb97b34396831e3780aea1",
]
}
The client id list is the 'audience' which is used to access this content -- you can vary that, provided that you vary it in all the right places. The thumbprint is a hash/certificate of the Open ID Connector, and 6938...aea1 is the current one used by GitHub Actions -- you can calculate/verify the value by following AWS' instructions. The thumbprint_list can hold up to 5 values, so newer versions can be appended when they are being made available ahead of time while continuing to use older ones.
If you're interested in where this magic value came from, you can find out at How can I calculate the thumbprint of an OpenID Connect server?
Once you have enabled the identity provider, you can use it to create one or more custom roles (replacing with :
data "aws_caller_identity" "current" {}
resource "aws_iam_role" "github_alblue" {
name = "GitHubAlBlue"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/token.actions.githubusercontent.com"
}
Condition = {
StringLike = {
"token.actions.githubusercontent.com:aud" : ["sts.amazonaws.com" ],
"token.actions.githubusercontent.com:sub" : "repo:alblue/*"
}
}
}]
})
}
You can create as many different roles as you need, and even split them up by audience (e.g. 'production', 'dev'). Provided that the OpenID connector's audience is trusted by the account then you're good to go. (You can use this to ensure that the OpenID Connector in a Dev account doesn't trust the roles in a production account and vice-versa.) You can have, for example, a read-only role for performing terraform validate and then another role for terraform apply.
The subject is passed from GitHub, but looks like:
repo:<organization>/<repository>:ref:refs/heads/<branch>
There may be different formats that come out later. You could have an action/role specifically for PRs if you use :ref:refs/pulls/* for example, and have another role for :ref:refs/heads/production/*.
The final step is getting your GitHub Actions configured to use the token that comes back from the AWS/OpenID Connect:
Standard Way
jobs:
terraform-validate:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout#v2
- name: Configure AWS credentials from Test account
uses: aws-actions/configure-aws-credentials#master
with:
role-to-assume: arn:aws:iam::<accountid>:role/GitHubAlBlue
aws-region: us-east-1
- name: Display Identity
run: aws sts get-caller-identity
Manual Way
What's actually happening under the covers is something like this:
jobs:
terraform-validate:
runs-on: ubuntu-latest
env:
AWS_WEB_IDENTITY_TOKEN_FILE: .git/aws.web.identity.token.file
AWS_DEFAULT_REGION: eu-west-2
AWS_ROLE_ARN: arn:aws:iam::<accountid>:role/GitHubAlBlue
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout#v2
- name: Configure AWS
run: |
sleep 3 # Need to have a delay to acquire this
curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" \
| jq -r '.value' > $AWS_WEB_IDENTITY_TOKEN_FILE
aws sts get-caller-identity
You need to ensure that your AWS_ROLE_ARN is the same as defined in your AWS account, and that the audience is the same as accepted by the OpenID Connect and the Role name as well.
Essentially, there's a race condition between the job starting, and the token's validity which doesn't come in until after GitHub has confirmed the job has started; if the size of the AWS_WEB_IDENITY_TOKEN_FILE is less than 10 chars, it's probably an error and sleep/spinning will get you the value afterwards.
The name of the AWS_WEB_IDENTITY_TOKEN_FILE doesn't really matter, so long as it's consistent. If you're using docker containers, then storing it in e.g. /tmp will mean that it's not available in any running containers. If you put it under .git in the workspace, then not only will git ignore it (if you're doing any hash calculations) but it will also be present in any other docker run actions that you do later on.
You might want to configure your role so that the validity of the period used is limited; once you have the web token it's valid until the end of the job, but the token requested has a lifetime of 15 minutes, so it's possible for a longer-running job to expose that.
It's likely that GitHub will have a blog post on how to configure/use this in the near future. The above information was inspired by https://awsteele.com/blog/2021/09/15/aws-federation-comes-to-github-actions.html, who has some examples in CloudFormation templates if that's your preferred thing.
Update GitHub (accidentally) changed their thumbprint and the example has been updated. See for more information. The new thumbprint is 6938fd4d98bab03faadb97b34396831e3780aea1 but it's possible to have multiple thumbprints in the IAM OpenID connector.
Github recently updated their certifcate chain and the thumprint has changed from the one mentioned above a031c46782e6e6c662c2c87c76da9aa62ccabd8e to 6938fd4d98bab03faadb97b34396831e3780aea1
Github Issue: https://github.com/aws-actions/configure-aws-credentials/issues/357
For some reason I keep getting this with the provided answer of AlBlue:
"message":"Can't issue ID_TOKEN for audience 'githubactions'."
Deploying the stack provided in this post, does work however.

Fargate tasks not calling aws services without aws configure

I am unable to call aws services from fargate tasks - secrets manager and sns.
I want these services to be invoked from inside the docker image which is hosted on ECR. When I run the pipeline everything loads and run correctly except when the script inside the docker container is invoked, it throws an error. The script makes a call to either secrets manager or sns. The error thrown is -
Unable to locate credentials. You can configure credentials by running "aws configure".
If I do aws configure then the error goes away and every things works smoothly. But I do not want to store the aws credentials anywhere.
When I open task definitions I can see two roles - pipeline-task and ecsTaskExecutionRole
Although, I have given full administrator rights to both of these roles, the pipeline still throws error. Is there any place missing where I can assign roles/policies etc. I want to completely avoid using aws configure.
If the script with the issue is not a PID 1 process ( used to stop and start the container ), it will not automatically read the Task Role (pipeline-task-role). From your description, this sounds like the case.
Add this to your Dockerfile:
RUN echo 'export $(strings /proc/1/environ | grep AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)' >> /root/.profile
The AWS SDK from the script should know where to pick up the credentials from after that.
I don't know if my problem was the same as yours but I also experienced this kind of problem where I had been set the task role but the container don't get the right permissions. After spending a few days, I discovered that if you set any of the AWS environment variables
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
or AWS_DEFAULT_REGION
into the task definition, the "Credential provider chain" will stop at the "Static credentials" step, so the SDK your script is using will look for the other credentials within the .aws/.credentials file and as it can't find them it throws Unable to locate credentials.
If you want to know further about the Credential provider chain you could read about it in https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html

adding existing GCP service account to Terraform root module for cloudbuild to build Terraform configuration

Asking the community if it's possible to do the following. (had no luck in finding further information)
I create a ci/cd pipeline with Github/cloudbuild/Terraform. I have cloudbuild build terraform configuration upon github pull request and merge to new branch. However, I have cloudbuild service account (Default) use with least privilege.
Question adheres, I would like terraform to pull permission from an existing service account with least privilege to prevent any exploits, etc. once cloudbuild gets pull build triggers to init terraform configuration. At this time, i.e terraform will extract existing external SA to obtain permission to build TF.
I tried to use service account, and binding roles to that service account but error happens that
states service account already existences.
Next step, is for me to use a module but I think this is also going to create a new SA with replicated roles.
If this is confusing I do apologize, I will help in refining the question to be more concise.
You have 2 solutions:
Use the Cloud Build service account when you execute your Terraform. Your provider look like this:
provider "google" {
// Useless with Cloud Build
// credentials = file("${var.CREDENTIAL_FILE}}")
project = var.PROJECT_ID
region = "europe-west1"
}
But this solution implies to grant several roles to Cloud Build only for Terraform process. A custom role is a good choice for granting only what is required.
The second solution is to use a service account key file. Here again 2 solutions:
Cloud Build creates the service account, grant all the role on it, generates a key and passes it to terraform. After the terraform execution, the service account is deleted by Cloud Build. Good solution, but you have to grant Cloud Build service account the capability to grant itself any roles and to generate a json Key file. That's a lot a responsibility!
Use an existing service account and the key generated on it. But you have to secure the key and to rotate it regularly. I recommend you to securely store it in secret manager, but for the rotation, you have to manage it by yourselves, today. With this process, Cloud Build download the key (in secret manager) and pass it to terraform. Here again, the Cloud Build service account has the right to access to secrets, that is a critical privilege. The step in Cloud Build is something like this:
steps:
- name: gcr.io/cloud-builders/gcloud:latest
entrypoint: "bash"
args:
- "-c"
- |
gcloud beta secrets versions access --secret=test-secret latest > my-secret-file.txt

What credentials does Boto3 use when running in AWS CodeBuild?

So I've written a set of deployment scripts that run in CodeBuild and use Boto3 to deploy some dockerised apps to ECS. The problem I'm having is when I want to deploy to our separate production account.
If I'm running the CodeBuild project from the dev account but want to create resources in the production account, it's my understanding that I should set up a role in the target account, allow the codebuild role to assume it, then call:
sts_client.assume_role(
RoleArn=arn_of_a_role_I_set_up,
RoleSessionName=some_name
)
This returns an access key, secret key, and session token. This works and returns what I'd expect.
Then what I want to do is just assign those values to these environment variables:
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
This is because according to the documentation here: http://boto3.readthedocs.io/en/latest/guide/configuration.html Boto3 should defer to if you don't explicitly set those variables in the client or session methods.
However, when I do this the resources still get created in the same dev account.
Also, if I call printenv in the first part of my buildspec.yml before my scripts attempt to set the environment variables, those AWS key/secret/token variables aren't present at all.
So when it's running in CodeBuild, where is Boto3 getting its credentials from?
Is the solution just going to be to pass in a key/secret/token to every boto3.client() call to be perfectly sure?
The credentials in the CodeBuild environment are from the service role associated with your CodeBuild project. Boto and botocore will use the "ContainerProvider" automatically to grab those credentials in the CodeBuild environment.