I have terraformed a stack from dynamodb -> aws glue -> athena, I can see all the columns have been created in aws glue and the table exists there but when looking at athena it seems only the database is there and even though when querying the database the tabled schema and columns exist the queries do not work.
SELECT tenant, COUNT(DISTINCT id) counts
FROM "account-profiles-glue-db"."account_profiles"
group by tenant
the above query fails:
my tf looks like:
locals {
table-name = var.table-name
athena-results-s3-name = "${local.table-name}-analytics"
athena-workgroup-name = "${local.table-name}"
glue-db-name = "${local.table-name}-glue-db"
glue-crawler-name = "${local.table-name}-crawler"
glue-crawler-role-name = "${local.table-name}-crawler-role"
glue-crawler-policy-name = "${local.table-name}-crawler"
}
resource "aws_kms_key" "aws_kms_key" {
description = "KMS key for whole project"
deletion_window_in_days = 10
}
##################################################################
# glue
##################################################################
resource "aws_glue_catalog_database" "aws_glue_catalog_database" {
name = local.glue-db-name
}
resource "aws_glue_crawler" "aws_glue_crawler" {
database_name = aws_glue_catalog_database.aws_glue_catalog_database.name
name = local.glue-crawler-name
role = aws_iam_role.aws_iam_role_glue_crawler.arn
configuration = jsonencode(
{
"Version" : 1.0
CrawlerOutput = {
Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
}
}
)
dynamodb_target {
path = local.table-name
}
}
resource "aws_iam_role" "aws_iam_role_glue_crawler" {
name = local.glue-crawler-role-name
assume_role_policy = jsonencode(
{
"Version" : "2012-10-17",
"Statement" : [
{
"Action" : "sts:AssumeRole",
"Principal" : {
"Service" : "glue.amazonaws.com"
},
"Effect" : "Allow",
"Sid" : ""
}
]
}
)
}
resource "aws_iam_role_policy" "aws_iam_role_policy_glue_crawler" {
name = local.glue-crawler-policy-name
role = aws_iam_role.aws_iam_role_glue_crawler.id
policy = jsonencode(
{
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"*"
],
"Resource" : [
"*"
]
}
]
}
)
}
##################################################################
# athena
##################################################################
resource "aws_s3_bucket" "aws_s3_bucket_analytics" {
bucket = local.athena-results-s3-name
acl = "private"
versioning {
enabled = true
}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.aws_kms_key.arn
sse_algorithm = "aws:kms"
}
}
}
}
resource "aws_athena_workgroup" "aws_athena_workgroup" {
name = local.athena-workgroup-name
configuration {
enforce_workgroup_configuration = true
publish_cloudwatch_metrics_enabled = true
result_configuration {
output_location = "s3://${aws_s3_bucket.aws_s3_bucket_analytics.bucket}/output/"
encryption_configuration {
encryption_option = "SSE_KMS"
kms_key_arn = aws_kms_key.aws_kms_key.arn
}
}
}
}
Looking at the Terraform you provided and the Glue Documentation on AWS, you are only crawling the DynamoDB table, you aren't triggering any jobs for it. The Glue jobs are where you run your business logic to transform and load the data. This is where you would declare to send your source data to S3 to be read by Athena.
If you need help generating the code for your Glue job, I would recommend using the Glue Studio which has a visual editor that will also generate your code. You can select your source, destination, and any transforms you need. At that point, you can use the Terraform glue_job resource and reference the script that you generated in the Glue Studio.
Unless you are needing to perform some ETL on the data, consider either connecting Athena directly to DynamoDB with the Athena-DynamoDB-Connector provided in the AWSLabs GitHub. You can also export your DynamoDB data to S3 and then connect Athena to that S3 bucket.
Related
I have an application load balancer and I'm trying to enable logging, terraform code below:
resource "aws_s3_bucket" "lb-logs" {
bucket = "yeo-messaging-${var.environment}-lb-logs"
}
resource "aws_s3_bucket_acl" "lb-logs-acl" {
bucket = aws_s3_bucket.lb-logs.id
acl = "private"
}
resource "aws_lb" "main" {
name = "main"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.public.id]
enable_deletion_protection = false
subnets = [aws_subnet.public.id, aws_subnet.public-backup.id]
access_logs {
bucket = aws_s3_bucket.lb-logs.bucket
prefix = "main-lb"
enabled = true
}
}
unfortunately I can't apply this due to:
Error: failure configuring LB attributes: InvalidConfigurationRequest: Access Denied for bucket: xxx-lb-logs. Please check S3bucket permission
│ status code: 400, request id: xx
I've seen a few SO threads and documentation but unfortunately it all applies to the classic load balancer, particularly the 'data' that allows you to get the service account of the laod balancer.
I have found some policy info on how to apply the right permissions to a SA but I can't seem to find how to apply the service account to the LB itself.
Example:
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "AWS"
identifiers = [data.aws_elb_service_account.main.arn]
}
actions = [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
]
resources = [
aws_s3_bucket.lb-logs.arn,
"${aws_s3_bucket.lb-logs.arn}/*",
]
}
}
resource "aws_s3_bucket_policy" "allow-lb" {
bucket = aws_s3_bucket.lb-logs.id
policy = data.aws_iam_policy_document.allow-lb.json
}
But this is all moot because data.aws_elb_service_account.main.arn is only for classic LB.
EDIT:
Full code with attempt from answer below:
resource "aws_s3_bucket" "lb-logs" {
bucket = "yeo-messaging-${var.environment}-lb-logs"
}
resource "aws_s3_bucket_acl" "lb-logs-acl" {
bucket = aws_s3_bucket.lb-logs.id
acl = "private"
}
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "Service"
identifiers = ["logdelivery.elb.amazonaws.com"]
}
actions = [
"s3:PutObject"
]
resources = [
"${aws_s3_bucket.lb-logs.arn}/*"
]
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = [
"bucket-owner-full-control"
]
}
}
}
resource "aws_s3_bucket_policy" "allow-lb" {
bucket = aws_s3_bucket.lb-logs.id
policy = data.aws_iam_policy_document.allow-lb.json
}
resource "aws_lb" "main" {
name = "main"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.public.id]
enable_deletion_protection = false
subnets = [aws_subnet.public.id, aws_subnet.public-backup.id]
access_logs {
bucket = aws_s3_bucket.lb-logs.bucket
prefix = "main-lb"
enabled = true
}
}
The bucket policy you need to use is provided in the official documentation for access logs on Application Load Balancers.
{
"Effect": "Allow",
"Principal": {
"Service": "logdelivery.elb.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/prefix/AWSLogs/your-aws-account-id/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
Notice bucket-name prefix and your-aws-account-id need to be replaced in that policy with your actual values.
In Terraform:
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "Service"
identifiers = ["logdelivery.elb.amazonaws.com"]
}
actions = [
"s3:PutObject"
]
resources = [
"${aws_s3_bucket.lb-logs.arn}/*"
]
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = [
"bucket-owner-full-control"
]
}
}
}
I'm using AWS Backups to back up several EC2 instances. I have terraform that seems to report correctly when there is a backup failure, but I am also interested in when the disks have backed up correctly, but when Windows VSS fails. Ultimately, the failed events are going to be published to Opsgenie. Is there a way to accomplish this? I have tried capturing all events with the 'aws_backup_vault_notifications' resource, and I have tried a filter as described in this AWS blog: https://aws.amazon.com/premiumsupport/knowledge-center/aws-backup-failed-job-notification/
I have included most of my terraform below, minus the opsgenie module; I can get successful or fully failing events published to Opsgenie just fine if I include those events:
locals {
backup_vault_events = toset(["BACKUP_JOB_FAILED", "COPY_JOB_FAILED"])
}
resource "aws_backup_region_settings" "legacy" {
resource_type_opt_in_preference = {
"Aurora" = false
"DynamoDB" = false
"EFS" = false
"FSx" = false
"RDS" = false
"Storage Gateway" = false
"EBS" = true
"EC2" = true
"DocumentDB" = false
"Neptune" = false
"VirtualMachine" = false
}
}
resource "aws_backup_vault" "legacy" {
name = "Legacy${var.environment_tag}"
kms_key_arn = aws_kms_key.key.arn
}
resource "aws_iam_role" "legacy_backup" {
name = "AWSBackupService"
permissions_boundary = data.aws_iam_policy.role_permissions_boundary.arn
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["sts:AssumeRole"],
"Effect": "allow",
"Principal": {
"Service": ["backup.amazonaws.com"]
}
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "legacy_backup" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
role = aws_iam_role.legacy_backup.name
}
###############################################################################
## Second Region Backup
###############################################################################
resource "aws_backup_vault" "secondary" {
provider = aws.secondary
name = "Legacy${var.environment_tag}SecondaryRegion"
kms_key_arn = aws_kms_replica_key.secondary_region.arn
tags = merge(
local.tags, {
name = "Legacy${var.environment_tag}SecondaryRegion"
}
)
}
data "aws_iam_policy_document" "backups" {
policy_id = "__default_policy_ID"
statement {
actions = [
"SNS:Publish",
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["backup.amazonaws.com"]
}
resources = [
aws_sns_topic.backup_alerts.arn
]
sid = "__default_statement_ID"
}
}
###############################################################################
# SNS
###############################################################################
resource "aws_sns_topic_policy" "backup_alerts" {
arn = aws_sns_topic.backup_alerts.arn
policy = data.aws_iam_policy_document.backups.json
}
resource "aws_backup_vault_notifications" "backup_alerts" {
backup_vault_name = aws_backup_vault.legacy.id
sns_topic_arn = aws_sns_topic.backup_alerts.arn
backup_vault_events = local.backup_vault_events
}
resource "aws_sns_topic_subscription" "backup_alerts_opsgenie_target" {
topic_arn = aws_sns_topic.backup_alerts.arn
protocol = "https"
endpoint = module.opsgenie_team.sns_integration_sns_endpoint
confirmation_timeout_in_minutes = 1
endpoint_auto_confirms = true
}
I'm trying to create a S3 Bucket Policy to provide access to a number of other accounts. I can't figure out how to do it with Terraform either with a for loop or with dynamic blocks.
locals {
account_ids = [
987654321098,
765432109876,
432109876543
]
}
resource "aws_s3_bucket_policy" "bucket" {
bucket = aws_s3_bucket.bucket.id
policy = jsonencode({
Statement = [
for account in local.account_ids : {
Effect = "Allow"
Action = [ ... ]
Principal = { AWS = [ "arn:aws:iam::${account}:root" ] }
Resource = "${aws_s3_bucket.bucket.arn}/states/${account}/*"
}
]
}
})
}
This fails with: Error: Missing argument separator / A comma is required to separate each function argument from the next.
If I try a dynamic block it's a similar issue.
Ultimately I want the Statement block to contain a list of 3 blocks, one for each account.
Any ideas?
You have too many closing brackets. It should be:
resource "aws_s3_bucket_policy" "bucket" {
bucket = aws_s3_bucket.bucket.id
policy = jsonencode({
Statement = [
for account in local.account_ids : {
Effect = "Allow"
Action = [ ... ]
Principal = { AWS = [ "arn:aws:iam::${account}:root" ] }
Resource = "${aws_s3_bucket.bucket.arn}/states/${account}/*"
}
]
})
}
I am learning about terraform modules and my objective is to build module which takes in a collection of s3 Buckets, and then creates and applies to them some iam policies.
What I have tried so far was to have some sort of a for loop, where I generate the policies and attach them to the buckets. For reference, my code looks something like this:
data "aws_iam_policy_document" "foo_iam_policy" {
statement {
sid = ""
effect = "Allow"
resources = [
for arn in var.s3_buckets_arn :
"${arn}/*"
]
actions = [
"s3:GetObject",
"s3:GetObjectVersion",
]
}
statement {
sid = ""
effect = "Allow"
resources = var.s3_buckets_arn
actions = ["s3:*"]
}
}
resource "aws_iam_policy" "foo_iam_policy" {
name = "foo-iam-policy"
path = "/"
description = "IAM policy for foo to access S3"
policy = data.aws_iam_policy_document.foo_iam_policy.json
}
data "aws_iam_policy_document" "foo_assume_rule_policy" {
statement {
effect = "Allow"
actions = [
"sts:AssumeRole"]
principals {
type = "AWS"
identifiers = [
var.foo_iam_user_arn]
}
condition {
test = "StringEquals"
values = var.foo_external_ids
variable = "sts:ExternalId"
}
}
}
resource "aws_iam_role" "foo_role" {
name = "foo-role"
assume_role_policy = data.aws_iam_policy_document.foo_assume_rule_policy.json
}
resource "aws_iam_role_policy_attachment" "foo_attach_s3_policy" {
role = aws_iam_role.foo_role.name
policy_arn = aws_iam_policy.foo_iam_policy.arn
}
data "aws_iam_policy_document" "foo_policy_source" {
for_each = toset(var.s3_buckets_arn)
// arn = each.key
statement {
sid = "VPCAllow"
effect = "Allow"
resources = [
each.key,
"${each.key}/*",
]
actions = [
"s3:*"]
condition {
test = "StringEquals"
variable = "aws:SourceVpc"
values = [
"vpc-01010101"]
}
principals {
type = "*"
identifiers = [
"*"]
}
}
}
I don't know if what I have tried makes much sense, or if there is a better way to loop through buckets and generate policies. My question is: what is the best practice for such cases where one wants to provide a list of buckets and loop through them to attach policies?
On a side note, I have encountered an error with my approach:
The “for_each” value depends on resource attributes that cannot be
determined (Terraform)
To attach a bucket policy to a bucket you should use aws_s3_bucket_policy, not aws_iam_policy_document. Also if the buckets already exist, probably it would be better to fetch their data first using data source aws_s3_bucket:
data "aws_s3_bucket" "selected" {
# s3_buckets_names easier to use then s3_buckets_arns
for_each = toset(var.s3_buckets_names)
bucket = each.value
}
Then, you can iterate over the selected buckets and add your policy to it:
resource "aws_s3_bucket_policy" "bucket_policie" {
for_each = data.aws_s3_bucket.selected
bucket = each.key
policy = "your policy document"
}
I want to create using terraform, Kinesis datastream and data firehose, and connect them (as pipeline). When I use the UI, when I go to firehose I can go to source->Kinesis stream, and pick the kinesis stream I created. But I want to do it using terraform.
This is the code to create kinesis stream (I took it from the official kinesis docs):
resource "aws_kinesis_stream" "test_stream" {
name = "terraform-kinesis-test"
shard_count = 1
retention_period = 30
shard_level_metrics = [
"IncomingBytes",
"OutgoingBytes",
]
tags = {
Environment = "test"
}
And this is the code for data firehose:
resource "aws_elasticsearch_domain" "test_cluster" {
domain_name = "firehose-es-test"
elasticsearch_version = "6.4"
cluster_config {
instance_type = "t2.small.elasticsearch"
}
ebs_options{
ebs_enabled = true
volume_size = 10
}
}
resource "aws_iam_role" "firehose_role" {
name = "firehose_test_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
name = "terraform-kinesis-firehose-test-stream"
destination = "elasticsearch"
/*
s3_configuration {
role_arn = "${aws_iam_role.firehose_role.arn}"
bucket_arn = "${aws_s3_bucket.bucket.arn}"
buffer_size = 10
buffer_interval = 400
compression_format = "GZIP"
}
*/
elasticsearch_configuration {
domain_arn = "${aws_elasticsearch_domain.test_cluster.arn}"
role_arn = "${aws_iam_role.firehose_role.arn}"
index_name = "test"
type_name = "test"
processing_configuration {
enabled = "true"
}
}
}
So how can I connect them, is the something like ${aws_kinesis_stream.test_stream.arn} ? or something similar?
I used the official docs of aws_kinesis_stream and aws_kinesis_firehose_delivery_stream (elasticsearch destination).
This is in the kinesis_firehose_delivery_stream] documentation. Acroll past the examples to the Argument Reference section, and you'll see this:
The kinesis_source_configuration object supports the following:
kinesis_stream_arn (Required) The kinesis stream used as the source of the firehose delivery stream.
role_arn (Required) The ARN of the role that provides access to the source Kinesis stream.