terraform storing sensitive data in state file - amazon-web-services

I am using variables with sensitivity true, even though, state file stores id and password. Any how to avoid it?
variable "rs_master_pass" {
type = string
sensitive = true
}
In state file,
"master_password": 'password'
Even though, taking out from state manually, comes back in each apply.

There is no "easy" way to avoid that. You must simply not hard-code the values in your TF files. Setting sensitive = true does not protect against having the secrets in plain text as you noticed.
The general ways for properly handling secrets in TF are:
use specialized, external vaults, such as Terraform Vault, AWS Parameter Store or AWS Secret Manger. They have to be set separately as to again not having their secrets available in TF state file.
use local-exec to setup the secrets outside of TF. Whatever you do in local-exec does not get stored in TF state file. This often is done to change dummy secrets that may be required in your TF code (e.g. RDS password) to the actual values outside of TF knowledge.
if the above solutions are not accessible, then you have to protect your state file (its good practice anyway). This is often done by storing it remotely in S3 under strict access policies.

Related

How do I conditionally create an S3 bucket

Is there a way I can use a terraform data call for a bucket (perhaps created and stored in a different state file) and then in the event nothing is in data, create the resource by setting a count?
I've been doing some experiments and continually get the following:
Error: Failed getting S3 bucket (example_random_bucket_name): NotFound: Not Found
status code: 404, request id: <ID here>, host id: <host ID here>
Sample code to test (this has been modified from the original code which generated this error):
variable "bucket_name" {
default = "example_random_bucket_name"
}
data "aws_s3_bucket" "new" {
bucket = var.bucket_name
}
resource "aws_s3_bucket" "s3_bucket" {
count = try(1, data.aws_s3_bucket.new.id == "" ? 1 : 0 )
bucket = var.bucket_name
}
I feel like rather than generating an error I should get an empty result, but that's not the case.
Terraform is a desired-state system, so you can only describe what result you want, not the steps/conditions to get there.
If Terraform did allow you to decide whether to declare a bucket based on whether there is already a bucket of that name, you would create a configuration that could never converge: on the first run, it would not exist and so your configuration would declare it. But on the second run, the bucket would then exist and therefore your configuration would not declare it anymore, and so Terraform would plan to destroy it. On the third run, it would propose to create it again, and so on.
Instead, you must decide as part of your system design which Terraform configuration (or other system) is responsible for managing each object:
If you decide that a particular Terraform configuration is responsible for managing this S3 bucket then you can declare it with an unconditional aws_s3_bucket resource.
If you decide that some other system ought to manage the bucket then you'll write your configuration to somehow learn about the bucket name from elsewhere, such as by an input variable or using the aws_s3_bucket data source.
Sadly you can't do this. data sources must exist, otherwise they error out. There is no build in way in TF to check if a resource exists or not. There is nothing in between, in a sense that a resource may, or may not exist.
If you require such functionality, you have to program it yourself using External Data Source. Or maybe simpler, provide an input variable bucket_exist, so that you explicitly set it during apply.
Data sources are designed to fail this way.
However, if you use a state file from external configuration, it's possible to declare an output in the external state, based on whether the s3 bucket is managed by that state and use it in s3_bucket resource as condition.
For example, the output in external state will be empty string (not managed) or value for whatever property is useful for you. Boolean is another choice. Delete data source from this configuration and add condition to the resource based on the output.
It's your call if any such workarounds complicate or simplify your configuration.

How can I insure that my retrieval of secrets is secure?

Currently I am using Terraform and Aws Secrets Manager to store and retrieve secrets, and I would like to have some insight if my implementation is secure, and if not how can I make it more secure. Let me illustrate with what I have tried.
In secrets.tf I create a secret like (this needs to be implemented with targeting):
resource "aws_secretsmanager_secret" "secrets_of_life" {
name = "top-secret"
}
I then go to the console and manually set the secret in AWS Secrets manager.
I then retrieve the secrets in secrets.tf like:
data "aws_secretsmanager_secret_version" "secrets_of_life_version" {
secret_id = aws_secretsmanager_secret.secrets_of_life.id
}
locals {
creds = jsondecode(data.aws_secretsmanager_secret_version.secrets_of_life.secret_string)
}
And then I proceed to use the secret (export them as K8s secrets for example) like:
resource "kubernetes_secret" "secret_credentials" {
metadata {
name = "kubernetes_secret"
namespace = kubernetes_namespace.some_namespace.id
}
data = {
top_secret = local.creds["SECRET_OF_LIFE"]
}
type = "kubernetes.io/generic"
}
It's worth mentioning that I store tf state remotely. Is my implementation secure? If not, how can I make it more secure?
yes I can confirm it is secure since you accomplished the following:
plain text secrets out of your code.
Your secrets are stored in a dedicated secret store that enforces encryption and strict access control.
Everything is defined in the code itself. There are no extra manual steps or wrapper scripts required.
Secret manager support rotating secrets, which is useful in case a secret got compromised.
The only thing I can wonder about is using a Terraform backend that supports encryption like s3, and avoid commet the state file to your source control.
Looks good, as #asri suggests it a good secure implementation.
The risk of exposure will be in the remote state. It is possible that the secret will be stored there in plain text. Assuming you are using S3, make sure that the bucket is encrypted. If you share tf state access with other developers, they may have access to those values in the remote state file.
From https://blog.gruntwork.io/a-comprehensive-guide-to-managing-secrets-in-your-terraform-code-1d586955ace1
These secrets will still end up in terraform.tfstate in plain text! This has been an open issue for more than 6 years now, with no clear plans for a first-class solution. There are some workarounds out there that can scrub secrets from your state files, but these are brittle and likely to break with each new Terraform release, so I don’t recommend them.
Hi I'm working on similar things, here're some thoughts:
when running Terraform for the second time, the secret will be in plain text in state files which are stored in S3, is S3 safe enough to store those sensitive strings?
My work is using the similar approach: run terraform create an empty secret / dummy strings as placeholder -> manually update to real credentials -> run Terraform again to tell the resource to use the updated credentials. The thing is that when we deploy in production, we want the process as automate as possible, this approach is not ideal ut I haven't figure out a better way.
If anyone has better ideas please feel free to leave a comment below.

Is it safe to apply Terraform plan when it says the database instance must be replaced?

I'm importing the existing resources (AWS RDS) but the terraform plan command showed a summary:
#aws_db_instance.my_main_db must be replaced
+/- resource "aws_db_instance" "my_main_db" {
~ address = x
allocated_storage = x
+ apply_immediately = x
~ arn = x
~ username = x
+ password = x
(others arguments with alot of +/- and ~)
}
my_main_db is online with persistent data. My question is as the title; Is it safe for the existing database to run terrafrom apply? I don't want to lose all my customer data.
"Replace" in Terraform's terminology means to destroy the existing object and create a new one to replace it. The +/- symbol (as opposed to -/+) indicates that this particular resource will be replaced in the "create before destroy" mode, where there will briefly be two database instances existing during the operation. (This may or may not be possible in practice, depending on whether the instance name is changing as part of this operation.)
For aws_db_instance in particular, destroying an instance is equivalent to deleting the instance in the RDS console: unless you have a backup of the contents of the database, it will be lost. Even if you do have a backup, you'll need to restore it via the RDS console or API rather than with Terraform because Terraform doesn't know about the backup/restore mechanism and so its idea of "create" is to produce an entirely new, empty database.
To sum up: applying a plan like this directly is certainly not generally "safe", because Terraform is planning to destroy your database and all of the contents along with it.
If you need to make changes to your database that cannot be performed without creating an entirely new RDS instance, you'll usually need to make those changes outside of Terraform using RDS-specific tools so that you can implement some process for transferring data between the old and new instances, whether that be backup and then restore (which will require a temporary outage) or temporarily running both instances and setting up replication from old to new until you are ready to shut off the old one. The details of such a migration are outside of Terraform's scope, because they are specific to whatever database engine you are using.
It's most likely not safe, but really only someone familiar with the application can make that decision. Look at the properties and what is going to change or be recreated. Unless you are comfortable with all of those properties changing, then it's not safe.

AWS S3 Bucket policy to prevent Object updates

I have set of objects in an S3 Bucket, all with a common prefix. I want to prevent updating of the currently existing objects, however allow users to add new objects in the same prefix.
As I understand it, the S3:PutObject action is both used to update existing objects AND create new ones.
Is there a bucket policy that can limit updating, while allowing creating?
ex: forbid modifying already existing s3:/bucket/Input/obj1, but allow creating s3:/bucket/Input/obj2
edit, context: We're using S3 as a store for regression test data, used to test our transformations. As we're continuously adding new test data, we want to ensure that the already ingested input data doesn't change. This would resolve one of the current causes of failed tests. All our input data is stored with the same prefix, and likewise for the expected data.
No, this is not possible.
The same API call, and the same permissions, are used to upload an object regardless of whether an object already exists with the same name.
You could use Amazon S3 Versioning to retain both the old object and the new object, but that depends on how you will be using the objects.
It is not possible in a way you describe, but there is a mechanism of sorts, called S3 object lock, which allows you to lock a specific version of file. It will not prevent creation of new versions of file, but the version you lock is going to be immutable.

Can I parameterize AWS lambda functions differently for staging and release resources?

I have a Lambda function invoked by S3 put events, which in turn needs to process the objects and write to a database on RDS. I want to test things out in my staging stack, which means I have a separate bucket, different database endpoint on RDS, and separate IAM roles.
I know how to configure the lambda function's event source and IAM stuff manually (in the Console), and I've read about lambda aliases and versions, but I don't see any support for providing operational parameters (like the name of the destination database) on a per-alias basis. So when I make a change to the function, right now it looks like I need a separate copy of the function for staging and production, and I would have to keep them in sync manually. All of the logic in the code would be the same, and while I get the source bucket and key as a parameter to the function when it's invoked, I don't currently have a way to pass in the destination stuff.
For the destination DB information, I could have a switch statement in the function body that checks the originating S3 bucket and makes a decision, but I hate making every function have to keep that mapping internally. That wouldn't work for the DB credentials or IAM policies, though.
I suppose I could automate all or most of this with the SDK. Has anyone set something like this up for a continuous integration-style deployment with Lambda, or is there a simpler way to do it that I've missed?
I found a workaround using Lambda function aliases. Given the context object, I can get the invoked_function_arn property, which has the alias (if any) at the end.
arn_string = context.invoked_function_arn
alias = arn_string.split(':')[-1]
Then I just use the alias as an index into a dict in my config.py module, and I'm good to go.
config[alias].host
config[alias].database
One thing I'm not crazy about is that I have to invoke my function from an alias every time, and now I can't use aliases for any other purpose without affecting this scheme. It would be nice to have explicit support for user parameters in the context object.