I am looking for the best practice to create and store my state file in S3 bucket.
Should I include the creation of S3 bucket along with the infrastructure or
Create a separate state file for its S3 bucket and a different for the resources.
if it is a different file I also need to store the state file of the s3 bucket created, then in this case I should be creating two s3 buckets one for infrastructure state and other for s3 bucket state file.
Secondly, if remote configuration is set and performing 'terraform destroy' is throwing me an error failed to upload state file: no such bucket found, as the bucket has been destroyed. should i first disable terraform remote config -disable and then run terraform destroy?
What's the best practice I should be following?
Personally I use a Terraform base stack to effectively bootstrap an AWS account for use with Terraform. This stack just stores its state file locally which is then committed to version control. This stack should only ever have to be run once so I see no problem with it not using a remote backend.
My Terraform base stack creates:
IAM user for Terraform to run as in future
s3 Bucket for storing state
KMS CMK for encrypting/decrypting state
Bucket policy statement to enforce encryption
Bucket policy statement to prevent the Terraform user from doing anything but s3:putObject & s3:getObject with state
KMS policy statement to prevent the Terraform user from doing anything but kms:GenerateDataKey* & kms:Decrypt
A DynamoDB table for state locking.
This can be expanded to include Roles, especially if your Terraform user will be deploying across multiple accounts.
You have a chicken and egg problem here if you want to store the state of the thing that will store the state.
Creating an S3 bucket outside of Terraform is easy so I would never bother with doing that in Terraform for the actual state bucket and then use Terraform to create absolutely everything else.
The ease of creating an S3 bucket (or one of the other S3 type storage options now covered by remote state) is one of the main benefits of using S3 to back your state files rather than, say, Consul which would require you to build a cluster of instances and configure them before you can store any state files.
Related
I want to copy the S3 bucket object to a different account, but the requirement can't use the Bucket policy,
then is it possible to copy content from one bucket to another without using the bucket policy?
You cannot use native S3 object replication between different accounts without using a bucket policy. As stated in the permissions documentation:
When the source and destination buckets aren't owned by the same accounts, the owner of the destination bucket must also add a bucket policy to grant the owner of the source bucket permissions to perform replication actions
You could write a custom application that uses IAM roles to replicate objects, but this will likely be quite involved as you'll need to track the state of the bucket and all of the objects written to it.
install AWS CLI,
run AWS configure set source bucket credentials as default and,
visit https://github.com/Shi191099/S3-Copy-old-data-without-Policy.git
I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes. I did explore the s3_read_write_resource option in the ParallelCluster documentation. But, it is not clear as to how we can access the bucket. For example, will it be mounted on the nodes, or will the users be able to access it by default. I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it (aws s3 ls s3://<name-of-the-bucket>).
I did go through this github issue talking about mounting S3 bucket using s3fs. In my experience it is very slow to access the objects using s3fs.
So, my question is,
How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file
These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances. They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource . And later used in the Cloudformation template. For example, here and here. There's no special way for accessing S3 objects.
To access S3 on one cluster instance, we need to use the aws cli or any SDK . Credentials will be automatically obtained from the instance role using instance metadata service.
Please note that ParallelCluster doesn't grant permissions to list S3 objects.
Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.
However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions. See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/.
I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons.
You might want to check the FSx section. It can create an attach an FSx for Lustre filesystem. It can import/export files to/from S3 natively. We just need to set import_path and export_path on this section.
I am using terraform to manage aws environments for our application. The environments have s3 buckets for various things. And when setting up a new environment I just want to copy the buckets from a base source bucket, or from an existing environment.
But I can't find anything that will provision a copy. The AWS interface lets you duplicate the setting when creating (which I don't need), but not the objects, so it may not be something terraform can do directly.
If so, how about indirectly?
There is no resource that enables the copying of objects from one S3 bucket to another. If you want to include this in your Terraform setup then you would need to use a local-exec provisioner.
It would need to execute the command below, with the support the AWS CLI to run aws s3 cp.
resource "null_resource" "s3_objects" {
provisioner "local-exec" {
command = "aws s3 cp s3://bucket1 s3://bucket2 --recursive"
}
}
For this to run the local server would need to have the AWS CLI installed with a role (or valid credentials) to enable the copy.
Generally-speaking, Terraform providers reflect operations that are natively supported by the underlying APIs, but in some cases we can use various Terraform resource types together to achieve functionality that the underlying provider lacks.
I believe there's no native S3 operation for bulk-copying objects from one bucket to another, so to solve this with Terraform requires decomposing the problem into smaller steps, which I think in this case would be:
Declare a new bucket, the target
List all of the objects in the source bucket
Declare one object in the new bucket per object in the source bucket.
The AWS provider can in principle do all three of these operations: it has managed resource types for both buckets and bucket objects, and it has a data source aws_s3_bucket_objects which can enumerate some or all of the objects in a bucket.
We can combine those pieces together in a Terraform configuration like this:
resource "aws_s3_bucket" "target" {
bucket = "copy-example-target"
}
data "aws_s3_bucket_objects" "source" {
bucket = "copy-example-source"
}
data "aws_s3_bucket_object" "source" {
for_each = toset(data.aws_s3_bucket_objects.source.keys)
bucket = data.aws_s3_bucket_objects.source.bucket
key = each.key
}
resource "aws_s3_bucket_object" "target" {
for_each = aws_s3_bucket_object.source
bucket = aws_s3_bucket.target.bucket
key = each.key
content = each.value.body
}
With that said, Terraform is likely not the best tool to for this situation for the following reasons:
The above configuration will cause Terraform to read all of the objects in the bucket into memory, which would be time consuming and use lots of RAM for larger buckets, and then ultimately store all of them in the Terraform state, which would make the state itself very large.
Because the aws_s3_bucket_object data source is intended mainly for retrieving small text-based objects, the above will work only if everything in the bucket meets the limitations described in the aws_s3_bucket_object documentation: the objects must all have text-indicating MIME types and they must all contain UTF-8 encoded text.
In this case then, I would prefer to use a specialized tool for the job which is designed to exploit all of the features of the S3 API to make the copy as efficient as possible, such as streaming the list of objects and streaming the contents of each object in chunks to avoid the need to have all of the data in memory at once. One such tool is in the AWS CLI itself, in the form of the aws s3 cp command with the --recursive option.
My understanding is that when configuring an S3 bucket notification with Terraform we can only configure a single notification per S3 bucket:
NOTE: S3 Buckets only support a single notification configuration. Declaring multiple
ws_s3_bucket_notification resources to the same S3 Bucket will cause a perpetual difference in
configuration. See the example "Trigger multiple Lambda functions" for an option.
The application uses a single S3 bucket as a data repository, i.e. when JSON files land there they trigger a lambda which submits a corresponding batch job to ingest from the file into a database.
This works well when we have a single developer deploying the infrastructure, but with multiple developers each time one of us runs terraform apply then it updates the only/single notification for the bucket, overwriting the resource's previous settings.
What is the best practice for utilizing S3 buckets for notifications? Are they best configured/created per Terraform workspace, and/or how are the buckets managed to allow for simultaneous developers standing up/down infrastructure resources using a common S3 bucket via terraform apply, etc.? Must you use one bucket per workspace for this use case, as suggested by the docs?
The current Terraform I have for the S3 notification (the code that allows for overwriting with the latest configuration):
data "aws_s3_bucket" "default" {
bucket = var.bucket
}
resource "aws_lambda_permission" "allow_bucket_execution" {
statement_id = "AllowExecutionFromS3Bucket"
action = "lambda:InvokeFunction"
function_name = var.lambda_function_name
principal = "s3.amazonaws.com"
source_arn = data.aws_s3_bucket.default.arn
}
resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = data.aws_s3_bucket.default.bucket
lambda_function {
lambda_function_arn = var.lambda_function_arn
events = ["s3:ObjectCreated:*"]
filter_prefix = var.namespace
filter_suffix = ".json"
}
}
The namespace variable is passed in as "${local.env}-${terraform.workspace}", with local.env as "dev", "uat", "prod", etc.
How can we modify the Terraform code above to allow for multiple notifications per S3 bucket (essentially one per Terraform workspace), or can it just not be done? If not then how is this best handled? Should I just use a bucket per workspace using a namespace variable like above as the S3 bucket name, and have it updated accordingly to the production bucket at deployment?
There are several options you have depending on your needs:
Create one bucket per env and workspace. Then the mentioned limitation of terraforms aws_s3_bucket_notification should no longer be an issue. I could imagine that the process you use to write to your bucket will then still only write to one bucket you specify. To solve this issue you could think about forwarding any objects uploaded to one "master" bucket to all other buckets (either with a lambda, itself triggered by a aws_s3_bucket_notification or probably by bucket replication).
create one bucket per env and deploy the aws_s3_bucket_notification without workspaces. Then you do no longer have the advantages of workspaces. But this might be a reasonable compromise between number of buckets and usability
keep just this one bucket, keep envs and workspaces, but deploy the aws_s3_bucket_notification resource only once (probably together with the bucket). Then this one aws_s3_bucket_notification resource would need to include the rules for all environment and workspaces.
It really depends on your situation what fits best. If those aws_s3_bucket_notification rarely change at all, and most changes are done in the lambda function, the last option might be the best. If you regularly want to change the aws_s3_bucket_notification and events to listen on, one of the other options might be more suitable.
I have an S3 bucket with which I want to move from one stack to another one. I was able to move the bucket using the instructions provided by Amazon, however, I wasn't able to move the bucket policy attached to it. What I tried was:
Make the bucket and the policy retainable
Remove the resources from the origin stack
Import the bucket into the target stack (I cannot import Bucket Policy, it's not supported)
After these actions, the bucket has the policy, however, it's not managed by CloudFormation. If I try to add AWS::S3::BucketPolicy definition for the bucket to the target stack, the deploy will fail with The bucket policy already exists on bucket. I'm looking for a way to overcome this. I understand that I can simply remove the policy using the console and then the error will disappear, but in that case, the resources in the bucket will be unavailable for some time.
The only solution I see is to leave it as-is for production. For other environments (testing, development, etc.) I want CloudFormation to correctly create and remove the bucket and its policy. So the workaround is to define bucket policy for all the environments except production.