I have a IAM user that is created by Terraform. Keys are stored in Hashicrop Vault and apps read them from there.
I have developed ansible code/bash scripts to rotate the keys periodically successfully.
But the issue is terraform doesn't like when the keys are rotated. Whenever we try to run terraform it tries to re-create the key
Is there any way to manage the key rotation via terraform? or can we ignore this in terraform. Any help with examples would be really helpful.
Key rotation in terraform is possible by using terraform apply -replace=<resource address>, which replaces the resource immediately or terraform taint <resource address>, which replaces the resource on the next apply for version below v0.15.2. See https://www.terraform.io/docs/cli/commands/taint.html for more information.
When using these commands it makes sense to set the lifecycle of the resources to replace to create_before_destroy to avoid downtime. So in the case of AWS access keys, that would be
resource "aws_iam_access_key" "my_user" {
user = "my_user_name"
lifecycle {
create_before_destroy = true
}
}
Given this configuration one can simply run terraform apply -replace=aws_iam_access_key.my_user to rotate the keys. One only has to make sure that downstream applications that use the keys take notice of the changes and are restarted if necessary to make use of the new keys.
We have managed to solve the problem by removing key generation initially via terraform when the user is created.
We are using some ansible and bash scripts to now generate and rotate keys and then vault api to update secrets in Vault.
Related
I have run a terraform script to create some resources, including a VPC with private subnets, an RDS instance, and Kinesis/Firehose. This is working fine.
When I went to re-run terraform and add some new resources (ElasticSearch in this case), Terraform started outputting a plan that included adding AWS tags to many of my previously existing resources, the text of which look like "map-migrated" = "d-server-01uw80xeqs2083". Here is a snippet from the plan:
# module.rds.aws_db_instance.etl_metastore_rds_dbinstance will be updated in-place
~ resource "aws_db_instance" "rds_dbinstance" {
id = "MyRDSId"
name = "etldb"
~ tags = {
- "map-migrated" = "d-server-01uw80xeqs2083" -> null
# (2 unchanged elements hidden)
}
~ tags_all = {
- "map-migrated" = "d-server-01uw80xeqs2083" -> null
# (2 unchanged elements hidden)
}
# (48 unchanged attributes hidden)
}
I don't know why these tags are being added. Neither Google nor the Terraform docs have been any help on this issue. Is this something I can safely ignore? I'm worried that somehow I have crossed versions of Terraform and it's doing a migration that I don't want. As far as I know I am using the same version of Terraform before and after (1.0.1).
Terraform proposes to remove those tags.
Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
Reason for this is, that Terraform compares what you have running on AWS to what you defined in your terraform files. (docs)
It will then take all the actions needed, so that your AWS infrastructure will match your Terraform configuration. (i.e. create, change and destroy AWS resources)
Here Terraform detected, that you have those "map-migrated" tags on various AWS resources, so it proposes to delete them, since they are not defined in your terraform files.
Now why and how were those tags added - and can you safely remove/ignore them?
Why?
These tags are used for the AWS Migration Acceleration Program (MAP), that's why adding those tags is called "MAP tagging".
This is a cloud migration program, in which AWS offers to help companies get into the cloud by giving methods, tools, trainings and money (you can get AWS Credits).
Now a requirement for that program is tagging your migrated resources with a "map-migrated" tag - or no credits for you. (credits are based on the cost and usage of tagged resources)
How?
Maybe your team is using the AWS Application Migration Service, which offers a setting to automatically apply MAP tags to launched instances.
Or someone added the tags manually in the AWS account.
Can you safely remove/ignore them?
Technically you can, it won't break anything.
But your project manager might get really angry, since you will lose out on funding. And the person who added those tags in AWS won't be happy either, if you override them with Terraform.
Solution
So the solution will be to ask your manager for the "List of all MAP Included Services" and incorporate the MAP tags in the Terraform code for all appropriate resources:
tags = {
...
"map-migrated" = "d-server-01uw80xeqs2083"
}
(Or - dirty solution - copy the tags from the terraform apply output into your code, until there are no such tag differences between AWS and your terraform code anymore.)
Note: this tag is always the same for one management (payer) account and all its member accounts. The unique value is called Server Id.
I am having issues deploying my docker images to aws ecr as part of a terraform deployment and I am trying to think through the best long term strategy.
At the moment I have a terraform remote backend in s3 and dynamodb on let's call it my master account. I then have dev/test etc environments in separate accounts. The terraform deployment is currently run off my local machine (mac) and uses the aws 'master' account and its credentials which in turn assumes a role in the target deployment account to create the resources as per:
provider "aws" { // tell terraform which SDK it needs to load
alias = "target"
region = var.region
assume_role {
role_arn = "arn:aws:iam::${var.deployment_account}:role/${var.provider_env_deployment_role_name}"
}
}
I am creating a number of ecs services with Fargate deployments. The container images are built in separate repos by GitHub Actions and saved as GitHub packages. These package names and versions are being deployed after the creation of the ecr and service (maybe that's not ideal thinking about it) and this is where the problems arise.
The process is to pull the image from GitHub Packages, retag it and upload to the ecr using multiple executions of a null_resource local-exec. Works fine stand alone but has problems as part of the terraform process. I think the reason is that the other resources use the above provider to get permissions but as null_resource does not accept a provider it cannot get the permissions this way. So I have been passing the aws creds values into the shell. Not convinced this is really secure but that's currently moot as it ain't working either. I get this error:
Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `The specified item already exists in the keychain.``
Part of me thinks this is the wrong approach and that as I migrate to deploying via a Github action I can separate the infrastructure deployment via terraform from what is really the application deployment and just use GitHub secrets to reset the credentials values then run the script.
Alternatively, maybe the keychain thing just goes away and my process will work fine? Secure ??
That's fine for this scenario but it isn't really a generic approach for all my use cases.
I am shortly going to start deploying multiple aws lambda functions with docker containers. Haven't done it before but it looks like the process is going to be: create ecr, deploy container, deploy lambda function. This really implies that the container deployment should integral to the terraform deployment which loops back to my issue with the local-exec??
I found Actions to deploy to ECR which would imply splitting the deployments into multiple files but that seems inelegant and potentially brittle.
Maybe there is a simple solution, but given where I am trying to go with this, what is my best approach?
I know this isn't a complete answer, but you should be pulling your AWS creds from environment variables. I don't really understand if you need credentials for different accounts, but if you do then swap them during the progress of your action. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html . Terraform should pick these up and automatically use them for AWS access.
Instead of those hard coded access key/secret access keys I'd suggest making use of Github/AWS's ability to assume role through temporary credentials with OIDC https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
You'd likely only define one initial role that you'd authenticate into and from there assume into the other accounts you're deploying into.
These the assume role credentials are only good for an hour and do not have the operation overhead of having to rotate them.
As suggested by Kevin Buchs answer...
My primary issue was related to deploying from a mac and the use of the keychain. As this was not on the critical path I went round it and set up a GitHub Action.
The Action loaded environmental variables from GitHub secrets for my 'master' aws account credentials.
AWS_ACCESS_KEY_ID: ${{ secrets.NK_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.NK_AWS_SECRET_ACCESS_KEY }}
I also loaded the target accounts credentials into environmental variables in the same way BUT with the prefix TF_VAR.
TF_VAR_DEVELOP_AWS_ACCESS_KEY_ID: ${{ secrets.DEVELOP_AWS_ACCESS_KEY_ID }}
TF_VAR_DEVELOP_AWS_SECRET_ACCESS_KEY: ${{ secrets.DEVELOP_AWS_SECRET_ACCESS_KEY }}
I then declare terraform variables which will be automatically populated from the environment variables.
variable "DEVELOP_AWS_ACCESS_KEY_ID" {
description = "access key for the dev account"
type = string
}
variable "DEVELOP_AWS_SECRET_ACCESS_KEY" {
description = "secret access key for the dev account"
type = string
}
And when I run a shell script with a local exec:
resource "null_resource" "image-upload-to-importcsv-ecr" {
provisioner "local-exec" {
command = "./ecr-push.sh ${var.DEVELOP_AWS_ACCESS_KEY_ID} ${var.DEVELOP_AWS_SECRET_ACCESS_KEY} "
}
}
Within the script I can then use these arguments to set the credentials eg
AWS_ACCESS=$1
AWS_SECRET=$1
.....
export AWS_ACCESS_KEY_ID=${AWS_ACCESS}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET}
and the script now has credentials to do whatever.
We have an AWS SecretsManager Secret that was created once. That secret will be updated by an external job every hour.
I have the problem that sometimes the terraform plan/apply fails with the following message:
AWS Provider 2.48
Error: Error refreshing state: 1 error occurred:
* module.xxx.xxx: 1 error occurred:
* module.xxx.aws_secretsmanager_secret_version.xxx:
aws_secretsmanager_secret_version.xxx: error reading Secrets Manager Secret Version: InvalidRequestException: You can't perform this operation on secret version 68AEABC3-34BE-4723-8BF5-469A44F9B1D9 because it was deleted.
We've tried two solutions:
1) Force delete the whole secret via aws cli, but this has the side effect that one of our dependend resources will also be recreated (ecs template definition depends on that secret). This works, but we do not want the side effect of recreating the ecs thing.
2) Manually edit the backend .tfstate file and set the current AWS secret version. Then run the plan again.
Both solution seem to be hacky in a way. What is the best way to solve this issue ?
You can use terraform import to reconcile the state difference before you run a plan or apply.
In your case, this would look like:
terraform import module.xxx.aws_secretsmanager_secret_version.xxx arn:aws:secretsmanager:some_region:some_account_id:secret:example-123456|xxxxx-xxxxxxx-xxxxxxx-xxxxx
I think perhaps the problem you are having is that by default AWS tries to "help you" by not letting you delete secrets automatically until 7 days have elapsed. AWS tries the "help you" by telling you they give you a grace period of 7 days to update your "code" that may rely on this. Which makes automation more difficult.
I have worked around this by setting the recovery window period to "0 days", effectively eliminating that grace period that AWS provides.
Then you can have terraform, rename, or delete your secret at will, either manually (via AWS CLI) or via terraform.
You can update an existing secret by putting in this value FIRST. Then change the name of the secret (if you wish to), or delete it (this terraform section) as desired and run the terraform again after the recovery window days = 0 has been applied.
Here is an example:
resource "aws_secretsmanager_secret" "mySecret" {
name = "your secret name"
recovery_window_in_days = "0"
// this is optional and can be set to true | false
lifecycle {
create_before_destroy = true
}
}
*Note, there is also an option to "create before destroy" you can set on the lifecyle.
https://www.terraform.io/docs/configuration/resources.html
Also, you can use the terraform resource to update the secret values like this:
This example will set the secret values once and then tell terraform to ignore any changes made to the values (username, password in this example) after the initial creation.
If you remove the lifecyle section, then terraform will keep track of whether or not the secret values themselves have changed. If they have changed they would revert back to the value in the terraform state.
If you store your tfstate files in an s3 protected bucket that is safer than not doing so, because they are plaintext in the statefile, so anyone with access to your terraform state file could see your secret values.
I would suggest: 1) figuring out what is deleting your secrets unexpectedly? 2) having your "external job" be a terraform bash script to update the values using a resource as in the example below.
Hope this gives you some ideas.
resource "aws_secretsmanager_secret_version" "your-secret-data" {
secret_id = aws_secretsmanager_secret.your-secret.id
secret_string = <<-EOF
{
"username": "usernameValue",
"password": "passwordValue"
}
EOF
// ignore any updates to the initial values above done after creation.
lifecycle {
ignore_changes = [
secret_string
]
}
}
I have the AWS CLI installed on my Windows computer, and running this command "works" exactly like I want it to.
aws ec2 describe-images
I get the following output, which is exactly what I want to see, because although I have access to AWS through my corporation (e.g. to check code into CodeCommit), I can see in the AWS web console for EC2 that I don't have permission to list running instances:
An error occurred (UnauthorizedOperation) when calling the DescribeImages operation: You are not authorized to perform this operation.
I've put terraform.exe onto my computer as well, and I've created a file "example.tf" that contains the following:
provider "aws" {
region = "us-east-1"
}
I'd like to issue some sort of Terraform command that would yell at me, explaining that my AWS account is not allowed to list Amazon instances.
Most Hello World examples involve using terraform plan against a resource to do an "almost-write" against AWS.
Personally, however, I always feel more comfortable knowing that things are behaving as expected with something a bit more "truly read-only." That way, I really know the round-trip to AWS worked but I didn't modify any of my corporation's state.
There's a bunch of stuff on the internet about "data sources" and their "aws_ami" or "aws_instances" flavors, but I can't find anything that tells me how to actually use it with a Terraform command for a simple print()-type interaction (the way it's obvious that, say, "resources" go with the "terraform plan" and "terraform apply" commands).
Is there something I can do with Terraform commands to "hello world" an attempt at listing all my organization's EC2 servers and, accordingly, watching AWS tell me to buzz off because I'm not authorized?
You can use the data source for AWS instances. You create a data source similar to the below:
data "aws_instances" "test" {
instance_tags = {
Role = "HardWorker"
}
filter {
name = "instance.group-id"
values = ["sg-12345678"]
}
instance_state_names = ["running", "stopped"]
}
This will attempt to perform a read action listing your EC2 instances designated by the filter you put in the config. This will also utilize the IAM associated with the Terraform user you are performing the terraform plan with. This will result in the error you described regarding lack of authorization, which is your stated goal. You should modify the filter to target your organization's EC2 instances.
I created the aws_db_instance to provision the RDS MySQL database using Terraform configuration. Now my next question is to execute the SQL Script (CREATE TABLE and INSERT statements) on the RDS. I did the following but there is no effect. terraform plan cannot even see my changes on executing the sql. What did I miss here? Thanks.
resource "aws_db_instance" "mydb" {
# ...
provisioner "remote-exec" {
inline = [
"chmod +x script.sql",
"script.sql args",
]
}
}
Check out this post: How to apply SQL Scripts on RDS with Terraform
If you're just trying to setup user's and permissions (you shouldn't use the root pw you set when you generate the RDS) there is a terraform provider for that:
https://www.terraform.io/docs/providers/mysql/index.html
But you're looking for DB schema and seeding. That provider cannot do that.
If you're open to doing it another way, you may want to check out using ssm automation documents and/or lambda. I'd use lambda. Pick a language that you're comfortable with. Set the role of the lambda to have permissions to read the password it needs to do the work. You can save the password in ssm parameter store. Then script your DB work.
Then do a local exec in terraform that simply calls the lambda and pass it the ID of the RDS and the path to the secret in ssm parameter store. That will ensure that the DB operations are done from compute inside the VPC without having to setup an EC2 bastion just for that purpose.
Here's how javascript can get this done, for example:
https://www.w3schools.com/nodejs/nodejs_mysql_create_table.asp