Is there a way to configure date-partitioned folders for AWS DMS endpoint target S3? - amazon-web-services

I'm using terraform in order to configure this DMS migration task that migrates (full-load+cdc) the data from a MySQL instance to a S3 bucket.
The problem is that the configuration seems not to take effect and no partition-folder is created. All the migrated files are created in the same directory within the bucket.
In the documentation they say the endpoint s3 setting DatePartitionEnabled, introduced in the version 3.4.2, is supported both for CDC and FullLoad+CDC.
My terraform configuration spec:
resource "aws_dms_endpoint" "example" {
endpoint_id = "example"
endpoint_type = "target"
engine_name = "s3"
s3_settings {
bucket_name = "example"
bucket_folder = "example-folder"
compression_type = "GZIP"
data_format = "parquet"
parquet_version = "parquet-2-0"
service_access_role_arn = var.service_access_role_arn
date_partition_enabled = true
}
tags = {
Name = "example"
}
}
But in the respective s3 bucket I get no folders, but sequential files as if this option wasn't there.
LOAD00000001.parquet
LOAD00000002.parquet
...
I'm using terraform 1.0.7, aws provider 3.66.0 and a DMS Replication Instance 3.4.6.
Does anyone know what could be this issue?

Related

Terraform Provider issue: registry.terraform.io/hashicorp/s3

I current have code that I have been using for quiet sometime that calls a custom S3 module. Today I tried to run the same code and I started getting an error regarding the provider.
╷ │ Error: Failed to query available provider packages │ │ Could not
retrieve the list of available versions for provider hashicorp/s3:
provider registry registry.terraform.io does not have a provider named
│ registry.terraform.io/hashicorp/s3 │ │ All modules should specify
their required_providers so that external consumers will get the
correct providers when using a module. To see which modules │ are
currently depending on hashicorp/s3, run the following command: │
terraform providers
Doing some digging seems that terraform is looking for a module registry.terraform.io/hashicorp/s3, which doesn't exist.
So far, I have tried the following things:
Validated that the S3 Resource code meets the standards of the upgrade Hashicorp did to 4.x this year. Plus I have been using it for a couple of months with no issues.
Delete .terraform directory and rerun terraform init (No success same error)
Delete .terraform directory and .terraform.hcl lock and run terraform init -upgrade (No Success)
I have tried to update my provider's file to try to force an upgrade (no Success)
I tried to change the provider to >= current version to pull the latest version with no success
Reading further, it refers to a caching problem of the terraform modules. I tried to run terraform providers lock and received this error.
Error: Could not retrieve providers for locking │ │ Terraform failed
to fetch the requested providers for darwin_amd64 in order to
calculate their checksums: some providers could not be installed: │ -
registry.terraform.io/hashicorp/s3: provider registry
registry.terraform.io does not have a provider named
registry.terraform.io/hashicorp/s3.
Kind of at my wits with what could be wrong. below is a copy of my version.tf which I changed from providers.tf based on another post I was following:
version.tf
# Configure the AWS Provider
provider "aws" {
region = "us-east-1"
use_fips_endpoint = true
}
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 4.9.0"
}
local = {
source = "hashicorp/local"
version = "~> 2.2.1"
}
}
required_version = ">= 1.2.0" #required terraform version
}
S3 Module
I did not include locals, outputs, or variables unless someone thinks we need to see them. As I said before, the module was running correctly until today. Hopefully, this is all you need for the provider's issue. Let me know if other files are needed.
resource "aws_s3_bucket" "buckets" {
count = length(var.bucket_names)
bucket = lower(replace(replace("${var.bucket_names[count.index]}-s3", " ", "-"), "_", "-"))
force_destroy = var.bucket_destroy
tags = local.all_tags
}
# Set Public Access Block for each bucket
resource "aws_s3_bucket_public_access_block" "bucket_public_access_block" {
count = length(var.bucket_names)
bucket = aws_s3_bucket.buckets[count.index].id
block_public_acls = var.bucket_block_public_acls
ignore_public_acls = var.bucket_ignore_public_acls
block_public_policy = var.bucket_block_public_policy
restrict_public_buckets = var.bucket_restrict_public_buckets
}
resource "aws_s3_bucket_acl" "bucket_acl" {
count = length(var.bucket_names)
bucket = aws_s3_bucket.buckets[count.index].id
acl = var.bucket_acl
}
resource "aws_s3_bucket_versioning" "bucket_versioning" {
count = length(var.bucket_names)
bucket = aws_s3_bucket.buckets[count.index].id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_lifecycle_configuration" "bucket_lifecycle_rule" {
count = length(var.bucket_names)
bucket = aws_s3_bucket.buckets[count.index].id
rule {
id = "${var.bucket_names[count.index]}-lifecycle-${count.index}"
status = "Enabled"
expiration {
days = var.bucket_backup_expiration_days
}
transition {
days = var.bucket_backup_days
storage_class = "GLACIER"
}
}
}
# AWS KMS Key Server Encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "bucket_encryption" {
count = length(var.bucket_names)
bucket = aws_s3_bucket.buckets[count.index].id
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.bucket_key[count.index].arn
sse_algorithm = var.bucket_sse
}
}
}
Looking for any other ideas I can use to fix this issue. thank you!!
Although you haven't included it in your question, I'm guessing that somewhere else in this Terraform module you have a block like this:
resource "s3_bucket" "example" {
}
For backward compatibility with modules written for older versions of Terraform, terraform init has some heuristics to guess what provider was intended whenever it encounters a resource that doesn't belong to one of the providers in the module's required_providers block. By default, a resource "belongs to" a provider by matching the prefix of its resource type name -- s3 in this case -- to the local names chosen in the required_providers block.
Given a resource block like the above, terraform init would notice that required_providers doesn't have an entry s3 = { ... } and so will guess that this is an older module trying to use a hypothetical legacy official provider called "s3" (which would now be called hashicorp/s3, because official providers always belong to the hashicorp/ namespace).
The correct name for this resource type is aws_s3_bucket, and so it's important to include the aws_ prefix when you declare a resource of this type:
resource "aws_s3_bucket" "example" {
}
This resource is now by default associated with the provider local name "aws", which does match one of the entries in your required_providers block and so terraform init will see that you intend to use hashicorp/aws to handle this resource.
My colleague and I finally found the problem. Turns out that we had a data call to the S3 bucket. Nothing was wrong with the module but the place I was calling the module had a local.tf action where I was calling s3 in a legacy format see the change below:
WAS
data "s3_bucket" "MyResource" {}
TO
data "aws_s3_bucket" "MyResource" {}
Appreciate the responses from everyone. Resource was the root of the problem but forgot that data is also a resource to check.

Error: Error creating CloudTrail: InvalidCloudWatchLogsLogGroupArnException

I am trying to create cloudtrail for an organization in AWS. When I try to run the plan on a targeted apply for
resource "aws_cloudtrail" "nfcisbenchmark" {
name = "nf-cisbenchmark-${terraform.workspace}"
s3_bucket_name = aws_s3_bucket.nfcisbenchmark_cloudtrail.id
enable_logging = var.enable_logging
# 3.2 Ensure CloudTrail log file validation is enabled (Automated)
enable_log_file_validation = var.enable_log_file_validation
# 3.1 Ensure CloudTrail is enabled in all regions (Automated)
is_multi_region_trail = var.is_multi_region_trail
include_global_service_events = var.include_global_service_events
is_organization_trail = "${local.environments[terraform.workspace] == "origin"? true : var.is_organization_trail}"
# 3.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs (Automated)
kms_key_id = aws_kms_key.nfcisbenchmark.arn
depends_on = [aws_s3_bucket.nfcisbenchmark_cloudtrail]
cloud_watch_logs_role_arn = aws_iam_role.cloudwatch.arn
cloud_watch_logs_group_arn = "${aws_cloudwatch_log_group.nfcisbenchmark.arn}:*"
event_selector {
# 3.11 Ensure that Object-level logging for read events is enabled for S3 bucket (Automated)
read_write_type = "All"
include_management_events = true
}
}
I get Error: Error creating CloudTrail: InvalidCloudWatchLogsLogGroupArnException: Access denied. Check the permissions for your role. Any help with this issue would be greatly appreciated.
Version 3.0.0 of the AWS provider bundled a breaking change to the aws_cloudwatch_log_group resource's ARN output by stripping the :* suffix returned previously. Instead you now have to explicitly add this in places where the AWS API wants the :* suffix. All of the documentation was then updated to follow this pattern as well which is why you see this in the aws_cloudtrail resource documentation:
resource "aws_cloudwatch_log_group" "example" {
name = "Example"
}
resource "aws_cloudtrail" "example" {
# ... other configuration ...
cloud_watch_logs_group_arn = "${aws_cloudwatch_log_group.example.arn}:*" # CloudTrail requires the Log Stream wildcard
}
For you though, on v2.6.0, your ARN already includes this :* so you don't need to add it an extra time but you do need to remember to strip the :* suffix on resources where the AWS API doesn't want that suffix (by the looks of this issue then the aws_datasync_task resource is one of those).
Alternatively you could update your AWS provider to > v3.0.0 and keep the suffix there which will help you with a lot of other potential issues in the future.

AWS Macie & Terraform - Select all S3 buckets in account

I am enabling AWS Macie 2 using terraform and I am defining a default classification job as following:
resource "aws_macie2_account" "member" {}
resource "aws_macie2_classification_job" "member" {
job_type = "ONE_TIME"
name = "S3 PHI Discovery default"
s3_job_definition {
bucket_definitions {
account_id = var.account_id
buckets = ["S3 BUCKET NAME 1", "S3 BUCKET NAME 2"]
}
}
depends_on = [aws_macie2_account.member]
}
AWS Macie needs a list of S3 buckets to analyze. I am wondering if there is a way to select all buckets in an account, using a wildcard or some other method. Our production accounts contain hundreds of S3 buckets and hard-coding each value in the s3_job_definition is not feasible.
Any ideas?
The Terraform AWS provider does not support a data source for listing S3 buckets at this time, unfortunately. For things like this (data sources that Terraform doesn't support), the common approach is to use the AWS CLI through an external data source.
These are modules that I like to use for CLI/shell commands:
As a data source (re-runs each time)
As a resource (re-runs only on resource recreate or on a change to a trigger)
Using the data source version, it would look something like:
module "list_buckets" {
source = "Invicton-Labs/shell-data/external"
version = "0.1.6"
// Since the command is the same on both Unix and Windows, it's ok to just
// specify one and not use the `command_windows` input arg
command_unix = "aws s3api list-buckets --output json"
// You want Terraform to fail if it can't get the list of buckets for some reason
fail_on_error = true
// Specify your AWS credentials as environment variables
environment = {
AWS_PROFILE = "myprofilename"
// Alternatively, although not recommended:
// AWS_ACCESS_KEY_ID = "..."
// AWS_SECRET_ACCESS_KEY = "..."
}
}
output "buckets" {
// We specified JSON format for the output, so decode it to get a list
value = jsondecode(module.list_buckets.stdout).Buckets
}
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
buckets = [
{
"CreationDate" = "2021-07-15T18:10:20+00:00"
"Name" = "bucket-foo"
},
{
"CreationDate" = "2021-07-15T18:11:10+00:00"
"Name" = "bucket-bar"
},
]

issue while deploying gcp cloud function deployment

I have following issue while deploying the cloud function (I am completely new to gcp and terraform )
I am trying to deploy a cloud function through the terraform; but the issue is that when I am deploying its destroying an existing cloud function which was already deployed in gcp (deployed by other colleague) even though cloud function name , bucket object name and archive file name are different (only bucket name and project id are same)
looks like its taking the state of existing cloud function which is already deployed
is there any way to keep the existing state unaffected?
code Snippet(as mentioned above there is already one cloud function deployed with same project id and bucket)
main.tf:
provider "google" {
project = "peoject_id"
credentials = "cdenetialfile"
region = "some-region"
}
locals {
timestamp = formatdate("YYMMDDhhmmss", timestamp())
root_dir = abspath("./app/")
}
data "archive_file" "archive" {
type = "zip"
output_path = "/tmp/function-${local.timestamp}.zip"
source_dir = local.root_dir
}
resource "google_storage_bucket_object" "object_archive" {
name = "archive-${local.timestamp}.zip"
bucket = "dev-bucket-tfstate"
source = "/tmp/function-${local.timestamp}.zip"
depends_on = [data.archive_file.archive]
}
resource "google_cloudfunctions_function" "translator_function" {
name = "Cloud_functionname"
available_memory_mb = 256
timeout = 61
runtime = "java11"
source_archive_bucket = "dev-bucket-tfstate"
source_archive_object = google_storage_bucket_object.object_archive.name
entry_point = "com.test.controller.myController"
event_trigger {
event_type = "google.pubsub.topic.publish"
resource = "topic_name"
}
}
backend.tf
terraform {
backend "gcs" {
bucket = "dev-bucket-tfstate"
credentials = "cdenetialfile"
}
}

How to use secret manager in creating DMS target endpoint for RDS

How do we create a DMS endpoint for RDS using Terraform by providing the Secret Manager ARN to fetch the credentials? I looked at the documentation but I couldn't find anything.
There's currently an open feature request for DMS to natively use secrets manager to connect to your RDS instance. This has a linked pull request that initially adds support for PostgreSQL and Oracle RDS instances for now but is currently unreviewed so it's hard to know when that functionality may be released.
If you aren't using automatic secret rotation (or can rerun Terraform after the rotation) and don't mind the password being stored in the state file but still want to use the secrets stored in AWS Secrets Manager then you could have Terraform retrieve the secret from Secrets Manager at apply time and use that to configure the DMS endpoint using the username and password combination instead.
A basic example would look something like this:
data "aws_secretsmanager_secret" "example" {
name = "example"
}
data "aws_secretsmanager_secret_version" "example" {
secret_id = data.aws_secretsmanager_secret.example.id
}
resource "aws_dms_endpoint" "example" {
certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012"
database_name = "test"
endpoint_id = "test-dms-endpoint-tf"
endpoint_type = "source"
engine_name = "aurora"
extra_connection_attributes = ""
kms_key_arn = "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
password = jsondecode(data.aws_secretsmanager_secret_version.example.secret_string)["password"]
port = 3306
server_name = "test"
ssl_mode = "none"
tags = {
Name = "test"
}
username = "test"
}