Where to store AWS credentials in ECS service - amazon-web-services

I have an ECS service, which requires AWS credentials. I use ECR to store docker images and jenkins visible only for VPN connections to build images.
I see 2 possibilities to provide AWS credentials to the service
Store them as Jenkins secret and insert into the docker image during build
Make them a part of the environment when creating ECS Task definition
What is more secure? Are there other possibilities?

First thing, You should not use AWS credentials while working inside AWS, you should assign the role to Task definition or services instead of passing the credentials to docker build or task definition.
With IAM roles for Amazon ECS tasks, you can specify an IAM role that
can be used by the containers in a task. Applications must sign their
AWS API requests with AWS credentials, and this feature provides a
strategy for managing credentials for your applications to use,
similar to the way that Amazon EC2 instance profiles provide
credentials to EC2 instances
So sometimes the underlying application is not designed in a way that can use role so in this I will recommend storing ENV in the task definition but again from where to get the value of ENV?
Task definition support two methods to deal with ENV,
Plain text as direct value
 Use ‘valueFrom’ attribute for ECS task definition
The following is a snippet of a task definition showing the format when referencing an Systems Manager Parameter Store parameter.
{
"containerDefinitions": [{
"secrets": [{
"name": "environment_variable_name",
"valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name"
}]
}]
}
This is the most secure and recommended method by AWS documentation so this is the better way as compared to ENV in plain text inside Task definition or ENV in Dockerfile.
You can read more here and systems-manager-parameter-store.
But to use these you will must provide permission to task definition to access systems-manager-parameter-store.

Related

Terraform deployment of Docker Containers to aws ecr

I am having issues deploying my docker images to aws ecr as part of a terraform deployment and I am trying to think through the best long term strategy.
At the moment I have a terraform remote backend in s3 and dynamodb on let's call it my master account. I then have dev/test etc environments in separate accounts. The terraform deployment is currently run off my local machine (mac) and uses the aws 'master' account and its credentials which in turn assumes a role in the target deployment account to create the resources as per:
provider "aws" { // tell terraform which SDK it needs to load
alias = "target"
region = var.region
assume_role {
role_arn = "arn:aws:iam::${var.deployment_account}:role/${var.provider_env_deployment_role_name}"
}
}
I am creating a number of ecs services with Fargate deployments. The container images are built in separate repos by GitHub Actions and saved as GitHub packages. These package names and versions are being deployed after the creation of the ecr and service (maybe that's not ideal thinking about it) and this is where the problems arise.
The process is to pull the image from GitHub Packages, retag it and upload to the ecr using multiple executions of a null_resource local-exec. Works fine stand alone but has problems as part of the terraform process. I think the reason is that the other resources use the above provider to get permissions but as null_resource does not accept a provider it cannot get the permissions this way. So I have been passing the aws creds values into the shell. Not convinced this is really secure but that's currently moot as it ain't working either. I get this error:
Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `The specified item already exists in the keychain.``
Part of me thinks this is the wrong approach and that as I migrate to deploying via a Github action I can separate the infrastructure deployment via terraform from what is really the application deployment and just use GitHub secrets to reset the credentials values then run the script.
Alternatively, maybe the keychain thing just goes away and my process will work fine? Secure ??
That's fine for this scenario but it isn't really a generic approach for all my use cases.
I am shortly going to start deploying multiple aws lambda functions with docker containers. Haven't done it before but it looks like the process is going to be: create ecr, deploy container, deploy lambda function. This really implies that the container deployment should integral to the terraform deployment which loops back to my issue with the local-exec??
I found Actions to deploy to ECR which would imply splitting the deployments into multiple files but that seems inelegant and potentially brittle.
Maybe there is a simple solution, but given where I am trying to go with this, what is my best approach?
I know this isn't a complete answer, but you should be pulling your AWS creds from environment variables. I don't really understand if you need credentials for different accounts, but if you do then swap them during the progress of your action. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html . Terraform should pick these up and automatically use them for AWS access.
Instead of those hard coded access key/secret access keys I'd suggest making use of Github/AWS's ability to assume role through temporary credentials with OIDC https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
You'd likely only define one initial role that you'd authenticate into and from there assume into the other accounts you're deploying into.
These the assume role credentials are only good for an hour and do not have the operation overhead of having to rotate them.
As suggested by Kevin Buchs answer...
My primary issue was related to deploying from a mac and the use of the keychain. As this was not on the critical path I went round it and set up a GitHub Action.
The Action loaded environmental variables from GitHub secrets for my 'master' aws account credentials.
AWS_ACCESS_KEY_ID: ${{ secrets.NK_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.NK_AWS_SECRET_ACCESS_KEY }}
I also loaded the target accounts credentials into environmental variables in the same way BUT with the prefix TF_VAR.
TF_VAR_DEVELOP_AWS_ACCESS_KEY_ID: ${{ secrets.DEVELOP_AWS_ACCESS_KEY_ID }}
TF_VAR_DEVELOP_AWS_SECRET_ACCESS_KEY: ${{ secrets.DEVELOP_AWS_SECRET_ACCESS_KEY }}
I then declare terraform variables which will be automatically populated from the environment variables.
variable "DEVELOP_AWS_ACCESS_KEY_ID" {
description = "access key for the dev account"
type = string
}
variable "DEVELOP_AWS_SECRET_ACCESS_KEY" {
description = "secret access key for the dev account"
type = string
}
And when I run a shell script with a local exec:
resource "null_resource" "image-upload-to-importcsv-ecr" {
provisioner "local-exec" {
command = "./ecr-push.sh ${var.DEVELOP_AWS_ACCESS_KEY_ID} ${var.DEVELOP_AWS_SECRET_ACCESS_KEY} "
}
}
Within the script I can then use these arguments to set the credentials eg
AWS_ACCESS=$1
AWS_SECRET=$1
.....
export AWS_ACCESS_KEY_ID=${AWS_ACCESS}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET}
and the script now has credentials to do whatever.

ECS task. How to use AWS CLI within container

I'm trying to use AWS cli commands inside the container.
I have given policy within ECS cluster instance but it seems the container comes up with error as it tries to call AWS CLI command inside the container as an entrypoint when it boots and fails.
My IAM role with Instance Profile allows to do KMS get and decrypt which is what I need for the AWS CLI operations
Is there a way to pass credentials like instance profile inside ECS task container?
To pass a role to your caontainer(s) in a task you can use IAM Roles for Tasks:
With IAM roles for Amazon ECS tasks, you can specify an IAM role that can be used by the containers in a task. Applications must sign their AWS API requests with AWS credentials, and this feature provides a strategy for managing credentials for your applications to use, similar to the way that Amazon EC2 instance profiles provide credentials to EC2 instances.

How to run aws cli commands as a specific user?

Currently we are running 4 commands:
below two aws cli commands in jenkins docker container:
sh 'aws cloudformation package ...'
s3Upload()
Below two aws cli commands in docker container:
aws s3 cp source dest
aws cloudformation deploy
To run these above 4 commands in docker container, aws cli derive access permissions from docker host( EC2 ) which assumes a role with policy having permissions ( to access s3 and create/update cloud formation stack).
But the problem with such solution is,
we have to assign this role(say xrole) to every EC2 that is running in each test environment. There are 3-4 test environments.
Internally, aws creates an adhoc user as aws::sts::{account Id}::assumerole/xrole/i-112223344 and above 4 commands run on behalf of this user.
Better solution would be to create a user and assign the same role(xrole) to this and run above 4 commands as this user.
But,
1) what is the process to create such user? Because it has to assume xrole...
2) how to run above 4 commands with this user?
Best practice is to use roles, not users when working with EC2 instances. Users are necessary only when you need to grant permissions to applications that are running on computers outside of AWS environment (on premise). And even then, it is still best practice to grant this user permissions to only assume role which grants the necessary permissions.
If you are running all your commands from within containers and you want to grant permissions to containers instead of the whole EC2 instance then what you can do is to use ECS service instead of plain EC2 instances.
When using EC2 launch type with ECS, you have the same control over the EC2 instance but the difference is that you can attach role to a particular task (container) instead of the whole EC2 instance. By doing this, you can have several different tasks (containers) running on the same EC2 instance while each of them have only permissions that its needs. So if one of your containers needs to upload data to S3, you can create necessary role, specify the role in task definition and only that particular task will have those permissions. Neither other tasks nor the EC2 instance itself will be able to upload objects to S3.
Moreover, if you specify awsvpc networking mode for your tasks, each task will get its own ENI which means that you can specify Security Group for each task separately even if they are running on the same EC2 instance.
Here is an example of task definition using docker image stored in ECR and role called AmazonECSTaskS3BucketRole.
{
"containerDefinitions": [
{
"name": "sample-app",
"image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/aws-nodejs-sample:v1",
"memory": 200,
"cpu": 10,
"essential": true
}
],
"family": "example_task_3",
"taskRoleArn": "arn:aws:iam::123456789012:role/AmazonECSTaskS3BucketRole"
}
Here is documentation for task definitions
Applications running on the same host share the permissions assigned to the host through the instance profile. If you would like to segregate different applications running on the same instance due to security requirements, it is best to launch them on separate instances.
Using access keys per application is not a recommended approach as access keys are long-term credentials and they can easily be retrieved when the host is shared.
It is possible to assign IAM roles to ECS tasks as suggested by the previous answer. However, containers that are running on your container instances are not prevented from accessing the credentials that are supplied through the instance profile. It is therefore recommended to assign minimal permissions to the container instance roles.
If you run your tasks in awsvpc network mode, then you can configure ECS agent to prevent a task from accessing the instance metadata. You should just set agent configuration variable, ECS_AWSVPC_BLOCK_IMDS=true and restart the agent.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html

Does my application need to ask for a role on ec2 instance to configure the session or leave it empty?

I'm trying to use the aws-sdk-go in my application. It's running on EC2 instance. Now in the Configuring Credentials of the doc,https://docs.aws.amazon.com/sdk-for-go/api/, it says it will look in
*Environment Credentials - Set of environment variables that are useful when sub processes are created for specific roles.
* Shared Credentials file (~/.aws/credentials) - This file stores your credentials based on a profile name and is useful for local development.
*EC2 Instance Role Credentials - Use EC2 Instance Role to assign credentials to application running on an EC2 instance. This removes the need to manage credential files in production.`
Wouldn't the best order be the reverse order? But my main question is do I need to ask the instance if it has a role and then use that to set up the credentials if it has a role? This is where I'm not sure of what I need to do and how.
I did try a simple test of creating a empty config with essentially only setting the region and running it on the instance with the role and it seems to have "worked" but in this case, I am not sure if I need to explicitly set the role or not.
awsSDK.Config{
Region: awsSDK.String(a.region),
MaxRetries: awsSDK.Int(maxRetries),
HTTPClient: http.DefaultClient,
}
I just want to confirm is this the proper way of doing it or not. My thinking is I need to do something like the following
role = use sdk call to get role on machine
set awsSDK.Config { Credentials: credentials form of role,
...
}
issue service command with returned client.
Any more docs/pointers would be great!
I have never used the go SDK, but the AWS SDKs I used automatically use the EC2 instance role if credentials are not found from any other source.
Here's an AWS blog post explaining the approach AWS SDKs follow when fetching credentials: https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/. In particular, see this:
If you use code like this, the SDKs look for the credentials in this
order:
In environment variables. (Not the .NET SDK, as noted earlier.)
In the central credentials file (~/.aws/credentials or
%USERPROFILE%.awscredentials).
In an existing default, SDK-specific
configuration file, if one exists. This would be the case if you had
been using the SDK before these changes were made.
For the .NET SDK, in the SDK Store, if it exists.
If the code is running on an EC2
instance, via an IAM role for Amazon EC2. In that case, the code gets
temporary security credentials from the instance metadata service; the
credentials have the permissions derived from the role that is
associated with the instance.
In my apps, when I need to connect to AWS resources, I tend to use an access key and secret key that have specific predefined IAM roles. Assuming I have those two, the code I use to create a session is:
awsCredentials := credentials.NewStaticCredentials(awsAccessKeyID, awsSecretAccessKey, "")
awsSession = session.Must(session.NewSession(&aws.Config{
Credentials: awsCredentials,
Region: aws.String(awsRegion),
}))
When I use this, the two keys are usually specified as either environment variables (if I deploy to a docker container).
A complete example: https://github.com/retgits/flogo-components/blob/master/activity/amazons3/activity.go

How to use AWS ECS Task Role in Node AWS SDK code

Code that uses the AWS Node SDK doesn't seem to be able to gain the role permissions of the ECS task.
If I run the code on an EC2 ECS instance, the code seems to inherit the role on the instance, not of the task.
If I run the code on Fargate, the code doesn't get any permission.
By contrast, any bash scripts that run within the instance seem to have the proper permissions.
Indeed, the documentation doesn't mention this as an option for the node sdk, just:
Loaded from IAM roles for Amazon EC2 (if running on EC2),
Loaded from the shared credentials file (~/.aws/credentials),
Loaded from environment variables,
Loaded from a JSON file on disk,
Hardcoded in your application
Is there any way to have your node code gain the permissions of the ECS task?
This seems to be the logical way to pass permissions to your code. It works beautifully with code running on an instance.
The only workaround I can think of is to create one IAM user per ECS service and pass the API Key/Secret as environmental variables in the task definition. However, that doesn't seem very secure since it would be visible in plain text to anyone with access to the task definition.
Your question is missing a lot of details on how you setup your ECS Cluster plus I am not sure if the question is for ECS or for Fargate specifically.
Make sure that you are using the latest version of the SDK. Javascript supports ECS and Fargate task credentials.
Often there is confusion about credentials on ECS. There is the IAM role that is assigned to the Cluster EC2 instances and the IAM role that is assigned to ECS tasks.
The most common problem is the "Trust Relationship" has not been setup on the ECS Task Role. Select your IAM role and then the "Trust Relationships" tab and make sure that it looks like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
In addition to the standard Amazon ECS permissions required to run tasks and services, IAM users also require iam:PassRole permissions to use IAM roles for tasks.
Next verify that you are using the IAM role in the task definition. Specify the correct IAM role ARN in the Task Role field. Note that this different than Task Execution Role (which allows containers to pull images and publish logs).
Next make sure that your ECS Instances are using the latest version of the ECS Agent. The agent version is listed on the "ECS Instances" tab under the right hand side column "Agent version". The current version is 1.20.3.
Are you using an ECS optimized AMI? If not, add --net=host to your docker run command that starts the agent. Review this link for more information.
I figured it out. This was a weird one.
A colleague thought it would be "safer" if we call Object.freeze on proccess.env. This was somehow interfering with the SDK's ability to access the credentials.
Removed that "improvement" and all is fine again. I think the lesson is "do not mess with process.env".