I am trying to create an inventory list for all the buckets in a aws account, i amusing the terraform data source block in terraform to fetch the s3 buckets but can't figure out how to get all the buckets in my account, or which expression to use to get all the buckets, so i can do an inventory on all of them, please find my code below.
data "aws_s3_bucket" "select_bucket" {
bucket = "????"
}
resource "aws_s3_bucket" "inventory" {
bucket = "x-bucket"
}
resource "aws_s3_bucket_inventory" "inventory_list" {
for_each = toset([data.aws_s3_bucket.select_bucket.id])
bucket = each.key
name = "lifecycle_analysis_bucket"
included_object_versions = "All"
schedule {
frequency = "Daily"
}
destination {
bucket {
format = "CSV"
bucket_arn = aws_s3_bucket.inventory.arn
}
}
}
which expression to use to get all the buckets,
There is no such expression. You have to prepare the list of all you buckets beforhand, and then you can iterate over them in your code. The other option is to develop your own custom data source which would use AWS CLI or SDK to get the list of your buckets and return to TF for further processing.
Related
I am writing a Terraform script to setup an event notification on multiple S3 buckets which are starting with given prefix.
For example I want to setup notification for bucket starting with finance-data. With help of aws_s3_bucket datasource, we can configure a multiple S3 buckets which are already present and later we can use them in aws_s3_bucket_notification resource. Example:
data "aws_s3_bucket" "source_bucket" {
# set of buckets on which event notification will be set
# finance-data-1 and finance-data-2 are actual bucket id
for_each = toset(["finance-data-1", "finance-data-2"])
bucket = each.value
}
resource "aws_s3_bucket_notification" "bucket_notification_to_lambda" {
for_each = data.aws_s3_bucket.source_bucket
bucket = each.value.id
lambda_function {
lambda_function_arn = aws_lambda_function.s3_event_lambda.arn
events = [
"s3:ObjectCreated:*",
"s3:ObjectRemoved:*"
]
}
}
In aws_s3_bucket datasource, I am not able to find an option to give a prefix of the bucket and instead I have to enter bucket-id for all the buckets. Is there any way to achieve this?
Is there any way to achieve this?
No there is not. You have to explicitly specify buckets that you want.
I am enabling AWS Macie 2 using terraform and I am defining a default classification job as following:
resource "aws_macie2_account" "member" {}
resource "aws_macie2_classification_job" "member" {
job_type = "ONE_TIME"
name = "S3 PHI Discovery default"
s3_job_definition {
bucket_definitions {
account_id = var.account_id
buckets = ["S3 BUCKET NAME 1", "S3 BUCKET NAME 2"]
}
}
depends_on = [aws_macie2_account.member]
}
AWS Macie needs a list of S3 buckets to analyze. I am wondering if there is a way to select all buckets in an account, using a wildcard or some other method. Our production accounts contain hundreds of S3 buckets and hard-coding each value in the s3_job_definition is not feasible.
Any ideas?
The Terraform AWS provider does not support a data source for listing S3 buckets at this time, unfortunately. For things like this (data sources that Terraform doesn't support), the common approach is to use the AWS CLI through an external data source.
These are modules that I like to use for CLI/shell commands:
As a data source (re-runs each time)
As a resource (re-runs only on resource recreate or on a change to a trigger)
Using the data source version, it would look something like:
module "list_buckets" {
source = "Invicton-Labs/shell-data/external"
version = "0.1.6"
// Since the command is the same on both Unix and Windows, it's ok to just
// specify one and not use the `command_windows` input arg
command_unix = "aws s3api list-buckets --output json"
// You want Terraform to fail if it can't get the list of buckets for some reason
fail_on_error = true
// Specify your AWS credentials as environment variables
environment = {
AWS_PROFILE = "myprofilename"
// Alternatively, although not recommended:
// AWS_ACCESS_KEY_ID = "..."
// AWS_SECRET_ACCESS_KEY = "..."
}
}
output "buckets" {
// We specified JSON format for the output, so decode it to get a list
value = jsondecode(module.list_buckets.stdout).Buckets
}
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
buckets = [
{
"CreationDate" = "2021-07-15T18:10:20+00:00"
"Name" = "bucket-foo"
},
{
"CreationDate" = "2021-07-15T18:11:10+00:00"
"Name" = "bucket-bar"
},
]
New to Terraform. Trying to apply a lifecycle rule to an existing s3 bucket declared as a datasource, but I guess I can't do that with a datasource - throws an error. Here's the gist of what I'm trying to achieve:
data "aws_s3_bucket" "test-bucket" {
bucket = "bucket_name"
lifecycle_rule {
id = "Expiration Rule"
enabled = true
prefix = "reports/"
expiration {
days = 30
}
}
}
...and if this were a resource, not a datasource, then it would work. How can I apply a lifecycle rule to an s3 bucket declared as a data source? Google Fu has yielded little in the way of results. Thanks!
The best way to solve this is to import your bucket to terraform state instead of using it as data.
For that try to put this on your terraform code:
resource "aws_s3_bucket" "test-bucket" {
bucket = "bucket_name"
lifecycle_rule {
id = "Expiration Rule"
enabled = true
prefix = "reports/"
expiration {
days = 30
}
}
}
And then run on the terminal:
terraform import aws_s3_bucket.test-bucket bucket_name
This will import the bucket to your state and now you can make the changes or add new things to your bucket using terraform.
Last step just run terraform apply and the lifecycle rule will be added.
I have some Terraform code like this:
resource "aws_s3_bucket_object" "file1" {
key = "someobject1"
bucket = "${aws_s3_bucket.examplebucket.id}"
source = "./src/index.php"
}
resource "aws_s3_bucket_object" "file2" {
key = "someobject2"
bucket = "${aws_s3_bucket.examplebucket.id}"
source = "./src/main.php"
}
# same code here, 10 files more
# ...
Is there a simpler way to do this?
Terraform supports loops via the count meta parameter on resources and data sources.
So, for a slightly simpler example, if you wanted to loop over a well known list of files you could do something like the following:
locals {
files = [
"index.php",
"main.php",
]
}
resource "aws_s3_bucket_object" "files" {
count = "${length(local.files)}"
key = "${local.files[count.index]}"
bucket = "${aws_s3_bucket.examplebucket.id}"
source = "./src/${local.files[count.index]}"
}
Unfortunately Terraform's AWS provider doesn't have support for the equivalent of aws s3 sync or aws s3 cp --recursive although there is an issue tracking the feature request.
Amazon Athena reads data from input Amazon S3 buckets using the IAM credentials of the user who submitted the query; query results are stored in a separate S3 bucket.
Here is the script in Hashicorp site https://www.terraform.io/docs/providers/aws/r/athena_database.html
resource "aws_s3_bucket" "hoge" {
bucket = "hoge"
}
resource "aws_athena_database" "hoge" {
name = "database_name"
bucket = "${aws_s3_bucket.hoge.bucket}"
}
Where it says
bucket - (Required) Name of s3 bucket to save the results of the query execution.
How can I specify the input S3 bucket in the terraform script?
You would use the storage_descriptor argument in the aws_glue_catalog_table resource:
https://www.terraform.io/docs/providers/aws/r/glue_catalog_table.html#parquet-table-for-athena
Here is an example of creating a table using CSV file(s):
resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
name = "your_table_name"
database_name = "${aws_athena_database.your_athena_database.name}"
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
}
storage_descriptor {
location = "s3://<your-s3-bucket>/your/file/location/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.mapred.TextInputFormat"
ser_de_info {
name = "my-serde"
serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
parameters = {
"field.delim" = ","
"skip.header.line.count" = "1"
}
}
columns {
name = "column1"
type = "string"
}
columns {
name = "column2"
type = "string"
}
}
}
The input S3 bucket is specified in each table you create in the database, as such, there's no global definition for it.
As of today, the AWS API doesn't have much provision for Athena management, as such, neither does the aws CLI command, and nor does Terraform. There's no 'proper' way to create a table via these means.
In theory, you could create a named query to create your table, and then execute that query (for which there is API functionality, but not yet Terraform). It seems a bit messy to me, but it would probably work if/when TF gets the StartQuery functionality. The asynchronous nature of Athena makes it tricky to know when that table has actually been created though, and so I can imagine TF won't fully support table creation directly.
TF code that covers the currently available functionality is here: https://github.com/terraform-providers/terraform-provider-aws/tree/master/aws
API doco for Athena functions is here: https://docs.aws.amazon.com/athena/latest/APIReference/API_Operations.html