Terraform Replacing Bucket Object Instead of Versioning - amazon-web-services

I'm setting up some Terraform to manage a lambda and s3 bucket with versioning on the contents of the s3. Creating the first version of the infrastructure is fine. When releasing a second version, terraform replaces the zip file instead of creating a new version.
I've tried adding versioning to the s3 bucket in terraform configuration and moving the api-version to a variable string.
data "archive_file" "lambda_zip" {
type = "zip"
source_file = "main.js"
output_path = "main.zip"
}
resource "aws_s3_bucket" "lambda_bucket" {
bucket = "s3-bucket-for-tft-project"
versioning {
enabled = true
}
}
resource "aws_s3_bucket_object" "lambda_zip_file" {
bucket = "${aws_s3_bucket.lambda_bucket.bucket}"
key = "v${var.api-version}-${data.archive_file.lambda_zip.output_path}"
source = "${data.archive_file.lambda_zip.output_path}"
}
resource "aws_lambda_function" "lambda_function" {
s3_bucket = "${aws_s3_bucket.lambda_bucket.bucket}"
s3_key = "${aws_s3_bucket_object.lambda_zip_file.key}"
function_name = "lambda_test_with_s3_version"
role = "${aws_iam_role.lambda_exec.arn}"
handler = "main.handler"
runtime = "nodejs8.10"
}
I would expect the output to be another zip file but with the lambda now pointing at the new version, with the ability to change back to the old version if var.api-version was changed.

Terraform isn't designed for creating this sort of "artifact" object where each new version should be separate from the ones before it.
The data.archive_file data source was added to Terraform in the early days of AWS Lambda when the only way to pass values from Terraform into a Lambda function was to retrieve the intended zip artifact, amend it to include additional files containing those settings, and then write that to Lambda.
Now that AWS Lambda supports environment variables, that pattern is no longer recommended. Instead, deployment artifacts should be created by some separate build process outside of Terraform and recorded somewhere that Terraform can discover them. For example, you could use SSM Parameter Store to record your current desired version and then have Terraform read that to decide which artifact to retrieve:
data "aws_ssm_parameter" "lambda_artifact" {
name = "lambda_artifact"
}
locals {
# Let's assume that this SSM parameter contains a JSON
# string describing which artifact to use, like this
# {
# "bucket": "s3-bucket-for-tft-project",
# "key": "v2.0.0/example.zip"
# }
lambda_artifact = jsondecode(data.aws_ssm_parameter.lambda_artifact)
}
resource "aws_lambda_function" "lambda_function" {
s3_bucket = local.lambda_artifact.bucket
s3_key = local.lambda_artifact.key
function_name = "lambda_test_with_s3_version"
role = aws_iam_role.lambda_exec.arn
handler = "main.handler"
runtime = "nodejs8.10"
}
This build/deploy separation allows for three different actions, whereas doing it all in Terraform only allows for one:
To release a new version, you can run your build process (in a CI system, perhaps) and have it push the resulting artifact to S3 and record it as the latest version in the SSM parameter, and then trigger a Terraform run to deploy it.
To change other aspects of the infrastructure without deploying a new function version, just run Terraform without changing the SSM parameter and Terraform will leave the Lambda function untouched.
If you find that a new release is defective, you can write the location of an older artifact into the SSM parameter and run Terraform to deploy that previous version.
A more complete description of this approach is in the Terraform guide Serverless Applications with AWS Lambda and API Gateway, which uses a Lambda web application as an example but can be applied to many other AWS Lambda use-cases too. Using SSM is just an example; any data that Terraform can retrieve using a data source can be used as an intermediary to decouple the build and deploy steps from one another.
This general idea can apply to all sorts of code build artifacts as well as Lambda zip files. For example: custom AMIs created with HashiCorp Packer, Docker images created using docker build. Separating the build process, the version selection mechanism, and the deployment process gives a degree of workflow flexibility that can support both the happy path and any exceptional paths taken during incidents.

Related

terraform statefile configuration

I am currently reading the book called "Terraform-up-and-learning 2nd edition"
In the section introducing setting state file remotely, I faced a trouble.
this is the code that I ran.
terraform {
backend "s3" {
bucket = "my-state"
key = "workspaces-example/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "my-lock"
encrypt = true
}
}
provider "aws" {
region = "ap-southeast-1"
profile = "my-test"
}
resource "aws_instance" "example" {
ami = "ami-02045ebddb047018b"
instance_type = "t2.micro"
}
prior to running this code, I made s3 and DDB using my AWS console by hand.
And
terraform init
terraform plan
works well.
But I can't find any state file in my AWS console.
Following the book, the state file should be in "my-state" s3.
In addition, I can't find any statefile in my local too.
The state file goes to the backend after you run terraform apply.
After that, when you run terraform init again it will look for the remote state file.
In your case, your plan does show the changes list but it did not output any files. You have to apply now.
manual approval -
terraform apply
auto -
terraform apply -auto-approve
Remember to destroy after playing around to save costs. Run terraform destroy
Best wishes.

How could my Terraform deployment of a static website to an AWS S3 bucket be improved?

It is very basic at the moment.
connection.tf
provider "aws" {
region = "eu-west-2"
}
main.tf
resource "aws_s3_bucket" "bucket" {
bucket = "mybucket"
acl = "public-read"
provisioner "local-exec" {
command = "aws s3 sync static/ s3://${aws_s3_bucket.bucket.bucket} --acl public-read --delete"
}
website {
index_document = "index.html"
}
}
I have a Github (CI/CD) Action that can rebuild the static/ and it's website contents when updates are pushed to the main branch.
So at the moment the Terraform files (I think) just provision the bucket and push the initial contents of static/
but is there anything else that can be done with Terraform ?
Or how can the inital deployment scripts be improved?
I'm new to Terraform but the static website is up and running on AWS S3.
I've researched online the best way to use Terraform (this is a requirement of the task) to deploy a static website to S3. Having the Github Action to (CI/CD) to update the website was the main suggestion. But not much was mentioned about how the Terraform aspect could be improved - optimised.
It seems very short and I expect there are many other configurations that should be included.
Some suggestions,
Every single names that can be used as variables, put them in a variables.tf file.
store the terraform state file in backend, preferably on an s3 bucket or terraform cloud (fairly new).
create a folder named 'scripts' and write some bash scripts (or powershell if on windows) to do the plan, apply, destroy etc. Why? Well after you proceed with terraform it's not just simply terraform apply, you may and most certainly will have to do more things, pass more parameters etc. So write some bash scripts and run them, let them do the heavy lifting.
I'll give you a situation, if you run terraform destroy you can not delete a bucket with objects in it. But if you can't delete it, it will throw error on CI/CD pipeline. So what you can do is remove the resources from state file then perform destroy. Before applying you can attach again.
I also have a repo on static websites using terraform and AWS, feel free to check out
Best wishes.

Can terraform duplicate the content of an s3 bucket?

I am using terraform to manage aws environments for our application. The environments have s3 buckets for various things. And when setting up a new environment I just want to copy the buckets from a base source bucket, or from an existing environment.
But I can't find anything that will provision a copy. The AWS interface lets you duplicate the setting when creating (which I don't need), but not the objects, so it may not be something terraform can do directly.
If so, how about indirectly?
There is no resource that enables the copying of objects from one S3 bucket to another. If you want to include this in your Terraform setup then you would need to use a local-exec provisioner.
It would need to execute the command below, with the support the AWS CLI to run aws s3 cp.
resource "null_resource" "s3_objects" {
provisioner "local-exec" {
command = "aws s3 cp s3://bucket1 s3://bucket2 --recursive"
}
}
For this to run the local server would need to have the AWS CLI installed with a role (or valid credentials) to enable the copy.
Generally-speaking, Terraform providers reflect operations that are natively supported by the underlying APIs, but in some cases we can use various Terraform resource types together to achieve functionality that the underlying provider lacks.
I believe there's no native S3 operation for bulk-copying objects from one bucket to another, so to solve this with Terraform requires decomposing the problem into smaller steps, which I think in this case would be:
Declare a new bucket, the target
List all of the objects in the source bucket
Declare one object in the new bucket per object in the source bucket.
The AWS provider can in principle do all three of these operations: it has managed resource types for both buckets and bucket objects, and it has a data source aws_s3_bucket_objects which can enumerate some or all of the objects in a bucket.
We can combine those pieces together in a Terraform configuration like this:
resource "aws_s3_bucket" "target" {
bucket = "copy-example-target"
}
data "aws_s3_bucket_objects" "source" {
bucket = "copy-example-source"
}
data "aws_s3_bucket_object" "source" {
for_each = toset(data.aws_s3_bucket_objects.source.keys)
bucket = data.aws_s3_bucket_objects.source.bucket
key = each.key
}
resource "aws_s3_bucket_object" "target" {
for_each = aws_s3_bucket_object.source
bucket = aws_s3_bucket.target.bucket
key = each.key
content = each.value.body
}
With that said, Terraform is likely not the best tool to for this situation for the following reasons:
The above configuration will cause Terraform to read all of the objects in the bucket into memory, which would be time consuming and use lots of RAM for larger buckets, and then ultimately store all of them in the Terraform state, which would make the state itself very large.
Because the aws_s3_bucket_object data source is intended mainly for retrieving small text-based objects, the above will work only if everything in the bucket meets the limitations described in the aws_s3_bucket_object documentation: the objects must all have text-indicating MIME types and they must all contain UTF-8 encoded text.
In this case then, I would prefer to use a specialized tool for the job which is designed to exploit all of the features of the S3 API to make the copy as efficient as possible, such as streaming the list of objects and streaming the contents of each object in chunks to avoid the need to have all of the data in memory at once. One such tool is in the AWS CLI itself, in the form of the aws s3 cp command with the --recursive option.

Terraform for AWS: How to have multiple events per S3 bucket filtered on object path?

My understanding is that when configuring an S3 bucket notification with Terraform we can only configure a single notification per S3 bucket:
NOTE: S3 Buckets only support a single notification configuration. Declaring multiple
ws_s3_bucket_notification resources to the same S3 Bucket will cause a perpetual difference in
configuration. See the example "Trigger multiple Lambda functions" for an option.
The application uses a single S3 bucket as a data repository, i.e. when JSON files land there they trigger a lambda which submits a corresponding batch job to ingest from the file into a database.
This works well when we have a single developer deploying the infrastructure, but with multiple developers each time one of us runs terraform apply then it updates the only/single notification for the bucket, overwriting the resource's previous settings.
What is the best practice for utilizing S3 buckets for notifications? Are they best configured/created per Terraform workspace, and/or how are the buckets managed to allow for simultaneous developers standing up/down infrastructure resources using a common S3 bucket via terraform apply, etc.? Must you use one bucket per workspace for this use case, as suggested by the docs?
The current Terraform I have for the S3 notification (the code that allows for overwriting with the latest configuration):
data "aws_s3_bucket" "default" {
bucket = var.bucket
}
resource "aws_lambda_permission" "allow_bucket_execution" {
statement_id = "AllowExecutionFromS3Bucket"
action = "lambda:InvokeFunction"
function_name = var.lambda_function_name
principal = "s3.amazonaws.com"
source_arn = data.aws_s3_bucket.default.arn
}
resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = data.aws_s3_bucket.default.bucket
lambda_function {
lambda_function_arn = var.lambda_function_arn
events = ["s3:ObjectCreated:*"]
filter_prefix = var.namespace
filter_suffix = ".json"
}
}
The namespace variable is passed in as "${local.env}-${terraform.workspace}", with local.env as "dev", "uat", "prod", etc.
How can we modify the Terraform code above to allow for multiple notifications per S3 bucket (essentially one per Terraform workspace), or can it just not be done? If not then how is this best handled? Should I just use a bucket per workspace using a namespace variable like above as the S3 bucket name, and have it updated accordingly to the production bucket at deployment?
There are several options you have depending on your needs:
Create one bucket per env and workspace. Then the mentioned limitation of terraforms aws_s3_bucket_notification should no longer be an issue. I could imagine that the process you use to write to your bucket will then still only write to one bucket you specify. To solve this issue you could think about forwarding any objects uploaded to one "master" bucket to all other buckets (either with a lambda, itself triggered by a aws_s3_bucket_notification or probably by bucket replication).
create one bucket per env and deploy the aws_s3_bucket_notification without workspaces. Then you do no longer have the advantages of workspaces. But this might be a reasonable compromise between number of buckets and usability
keep just this one bucket, keep envs and workspaces, but deploy the aws_s3_bucket_notification resource only once (probably together with the bucket). Then this one aws_s3_bucket_notification resource would need to include the rules for all environment and workspaces.
It really depends on your situation what fits best. If those aws_s3_bucket_notification rarely change at all, and most changes are done in the lambda function, the last option might be the best. If you regularly want to change the aws_s3_bucket_notification and events to listen on, one of the other options might be more suitable.

How to share a terraform script without module dependencies

I want to share a terraform script that will be used across different projects. I know how to create and share modules, but this setup has a big annoyance: when I reference a module in a script and perform a terraform apply, if the module resource does not exist it will be created, but also if I perform a terraform destroy this resource will be destroyed.
If I have two projects dependent on the same module, and in one of them I call a terraform destroy it may lead to a inconsistent state, since the module is being used by another project. The script can either fail because it cannot destroy the resource or it will destroy the resource and affect the other project.
In my scenario, I want to share network scripts between two projects and I want the network resources to never be destroyed. I cannot create a project only for this resource because I need to reference it somehow in my projects, and the only way to do it is via its ID, which I have no idea what is going to be.
prevent_destroy is also not an option, since I do need to destroy other resources but the shared script resource. This configuration makes terraform destroy fail.
Is there any way to reference the resource, like by its name, or is there any other better approach to accomplish what I want?
If I understand you correctly, you have some resource R that is a "singleton". That is, only one instance of R can ever exist in your AWS account. For example, you can only ever have one aws_route53_zone with the name "foo.com". If you include R as a module in two different places, then either one may create it when you run terraform apply and either one may delete it when you run terraform destroy. You'd like to avoid that, but you still need some way to get an output attribute from R (e.g. the zone_id for an aws_route53_zone resource is generated by AWS, so you can't guess it).
If that's the case, then instead of using a R as a module, you should:
Create R by itself in its own set of Terraform templates. Let's say those are under /terraform/R.
Configure /terraform/R to use Remote State. For example, here is how you can configure those templates to store their remote state in an S3 bucket (you'll need to fill in the bucket name/region as indicated):
terraform remote config \
-backend=s3 \
-backend-config="bucket=(YOUR BUCKET NAME)" \
-backend-config="key=terraform.tfstate" \
-backend-config="region=(YOUR BUCKET REGION)" \
-backend-config="encrypt=true"
Define any output attributes you need from R as output variables. For example:
output "zone_id" {
value = "${aws_route_53.example.zone_id}"
}
When you run terraform apply in /terraform/R, it will store its Terraform state, including that output, in an S3 bucket.
Now, in all other Terraform templates that need that output attribute from R, you can pull it in from the S3 bucket using the terraform_remote_state data source. For example, let's say you had some template /terraform/foo that needed that zone_id parameter to create an aws_route53_record (you'll need to fill in the bucket name/region as indicated):
data "terraform_remote_state" "r" {
backend = "s3"
config {
bucket = "(YOUR BUCKET NAME)"
key = "terraform.tfstate"
region = "(YOUR BUCKET REGION)"
}
}
resource "aws_route53_record" "www" {
zone_id = "${data.terraform_remote_state.r.zone_id}"
name = "www.foo.com"
type = "A"
ttl = "300"
records = ["${aws_eip.lb.public_ip}"]
}
Note that terraform_remote_state is a read-only data source. That means when you run terraform apply or terraform destroy on any templates that use that resource, they will not have any effect in R.
For more info, check out How to manage terraform state and Terraform: Up & Running.