Aws KInesis Terraform - How to connect Data Streams to Data Firehose - amazon-web-services

I want to create using terraform, Kinesis datastream and data firehose, and connect them (as pipeline). When I use the UI, when I go to firehose I can go to source->Kinesis stream, and pick the kinesis stream I created. But I want to do it using terraform.
This is the code to create kinesis stream (I took it from the official kinesis docs):
resource "aws_kinesis_stream" "test_stream" {
name = "terraform-kinesis-test"
shard_count = 1
retention_period = 30
shard_level_metrics = [
"IncomingBytes",
"OutgoingBytes",
]
tags = {
Environment = "test"
}
And this is the code for data firehose:
resource "aws_elasticsearch_domain" "test_cluster" {
domain_name = "firehose-es-test"
elasticsearch_version = "6.4"
cluster_config {
instance_type = "t2.small.elasticsearch"
}
ebs_options{
ebs_enabled = true
volume_size = 10
}
}
resource "aws_iam_role" "firehose_role" {
name = "firehose_test_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
name = "terraform-kinesis-firehose-test-stream"
destination = "elasticsearch"
/*
s3_configuration {
role_arn = "${aws_iam_role.firehose_role.arn}"
bucket_arn = "${aws_s3_bucket.bucket.arn}"
buffer_size = 10
buffer_interval = 400
compression_format = "GZIP"
}
*/
elasticsearch_configuration {
domain_arn = "${aws_elasticsearch_domain.test_cluster.arn}"
role_arn = "${aws_iam_role.firehose_role.arn}"
index_name = "test"
type_name = "test"
processing_configuration {
enabled = "true"
}
}
}
So how can I connect them, is the something like ${aws_kinesis_stream.test_stream.arn} ? or something similar?
I used the official docs of aws_kinesis_stream and aws_kinesis_firehose_delivery_stream (elasticsearch destination).

This is in the kinesis_firehose_delivery_stream] documentation. Acroll past the examples to the Argument Reference section, and you'll see this:
The kinesis_source_configuration object supports the following:
kinesis_stream_arn (Required) The kinesis stream used as the source of the firehose delivery stream.
role_arn (Required) The ARN of the role that provides access to the source Kinesis stream.

Related

cannot run queries against aws athena and glue database

I have terraformed a stack from dynamodb -> aws glue -> athena, I can see all the columns have been created in aws glue and the table exists there but when looking at athena it seems only the database is there and even though when querying the database the tabled schema and columns exist the queries do not work.
SELECT tenant, COUNT(DISTINCT id) counts
FROM "account-profiles-glue-db"."account_profiles"
group by tenant
the above query fails:
my tf looks like:
locals {
table-name = var.table-name
athena-results-s3-name = "${local.table-name}-analytics"
athena-workgroup-name = "${local.table-name}"
glue-db-name = "${local.table-name}-glue-db"
glue-crawler-name = "${local.table-name}-crawler"
glue-crawler-role-name = "${local.table-name}-crawler-role"
glue-crawler-policy-name = "${local.table-name}-crawler"
}
resource "aws_kms_key" "aws_kms_key" {
description = "KMS key for whole project"
deletion_window_in_days = 10
}
##################################################################
# glue
##################################################################
resource "aws_glue_catalog_database" "aws_glue_catalog_database" {
name = local.glue-db-name
}
resource "aws_glue_crawler" "aws_glue_crawler" {
database_name = aws_glue_catalog_database.aws_glue_catalog_database.name
name = local.glue-crawler-name
role = aws_iam_role.aws_iam_role_glue_crawler.arn
configuration = jsonencode(
{
"Version" : 1.0
CrawlerOutput = {
Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
}
}
)
dynamodb_target {
path = local.table-name
}
}
resource "aws_iam_role" "aws_iam_role_glue_crawler" {
name = local.glue-crawler-role-name
assume_role_policy = jsonencode(
{
"Version" : "2012-10-17",
"Statement" : [
{
"Action" : "sts:AssumeRole",
"Principal" : {
"Service" : "glue.amazonaws.com"
},
"Effect" : "Allow",
"Sid" : ""
}
]
}
)
}
resource "aws_iam_role_policy" "aws_iam_role_policy_glue_crawler" {
name = local.glue-crawler-policy-name
role = aws_iam_role.aws_iam_role_glue_crawler.id
policy = jsonencode(
{
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"*"
],
"Resource" : [
"*"
]
}
]
}
)
}
##################################################################
# athena
##################################################################
resource "aws_s3_bucket" "aws_s3_bucket_analytics" {
bucket = local.athena-results-s3-name
acl = "private"
versioning {
enabled = true
}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.aws_kms_key.arn
sse_algorithm = "aws:kms"
}
}
}
}
resource "aws_athena_workgroup" "aws_athena_workgroup" {
name = local.athena-workgroup-name
configuration {
enforce_workgroup_configuration = true
publish_cloudwatch_metrics_enabled = true
result_configuration {
output_location = "s3://${aws_s3_bucket.aws_s3_bucket_analytics.bucket}/output/"
encryption_configuration {
encryption_option = "SSE_KMS"
kms_key_arn = aws_kms_key.aws_kms_key.arn
}
}
}
}
Looking at the Terraform you provided and the Glue Documentation on AWS, you are only crawling the DynamoDB table, you aren't triggering any jobs for it. The Glue jobs are where you run your business logic to transform and load the data. This is where you would declare to send your source data to S3 to be read by Athena.
If you need help generating the code for your Glue job, I would recommend using the Glue Studio which has a visual editor that will also generate your code. You can select your source, destination, and any transforms you need. At that point, you can use the Terraform glue_job resource and reference the script that you generated in the Glue Studio.
Unless you are needing to perform some ETL on the data, consider either connecting Athena directly to DynamoDB with the Athena-DynamoDB-Connector provided in the AWSLabs GitHub. You can also export your DynamoDB data to S3 and then connect Athena to that S3 bucket.

how to configure s3 bucket to allow aws application load balancer (not class) use it? currently throws' access denied'

I have an application load balancer and I'm trying to enable logging, terraform code below:
resource "aws_s3_bucket" "lb-logs" {
bucket = "yeo-messaging-${var.environment}-lb-logs"
}
resource "aws_s3_bucket_acl" "lb-logs-acl" {
bucket = aws_s3_bucket.lb-logs.id
acl = "private"
}
resource "aws_lb" "main" {
name = "main"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.public.id]
enable_deletion_protection = false
subnets = [aws_subnet.public.id, aws_subnet.public-backup.id]
access_logs {
bucket = aws_s3_bucket.lb-logs.bucket
prefix = "main-lb"
enabled = true
}
}
unfortunately I can't apply this due to:
Error: failure configuring LB attributes: InvalidConfigurationRequest: Access Denied for bucket: xxx-lb-logs. Please check S3bucket permission
│ status code: 400, request id: xx
I've seen a few SO threads and documentation but unfortunately it all applies to the classic load balancer, particularly the 'data' that allows you to get the service account of the laod balancer.
I have found some policy info on how to apply the right permissions to a SA but I can't seem to find how to apply the service account to the LB itself.
Example:
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "AWS"
identifiers = [data.aws_elb_service_account.main.arn]
}
actions = [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
]
resources = [
aws_s3_bucket.lb-logs.arn,
"${aws_s3_bucket.lb-logs.arn}/*",
]
}
}
resource "aws_s3_bucket_policy" "allow-lb" {
bucket = aws_s3_bucket.lb-logs.id
policy = data.aws_iam_policy_document.allow-lb.json
}
But this is all moot because data.aws_elb_service_account.main.arn is only for classic LB.
EDIT:
Full code with attempt from answer below:
resource "aws_s3_bucket" "lb-logs" {
bucket = "yeo-messaging-${var.environment}-lb-logs"
}
resource "aws_s3_bucket_acl" "lb-logs-acl" {
bucket = aws_s3_bucket.lb-logs.id
acl = "private"
}
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "Service"
identifiers = ["logdelivery.elb.amazonaws.com"]
}
actions = [
"s3:PutObject"
]
resources = [
"${aws_s3_bucket.lb-logs.arn}/*"
]
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = [
"bucket-owner-full-control"
]
}
}
}
resource "aws_s3_bucket_policy" "allow-lb" {
bucket = aws_s3_bucket.lb-logs.id
policy = data.aws_iam_policy_document.allow-lb.json
}
resource "aws_lb" "main" {
name = "main"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.public.id]
enable_deletion_protection = false
subnets = [aws_subnet.public.id, aws_subnet.public-backup.id]
access_logs {
bucket = aws_s3_bucket.lb-logs.bucket
prefix = "main-lb"
enabled = true
}
}
The bucket policy you need to use is provided in the official documentation for access logs on Application Load Balancers.
{
"Effect": "Allow",
"Principal": {
"Service": "logdelivery.elb.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/prefix/AWSLogs/your-aws-account-id/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
Notice bucket-name prefix and your-aws-account-id need to be replaced in that policy with your actual values.
In Terraform:
data "aws_iam_policy_document" "allow-lb" {
statement {
principals {
type = "Service"
identifiers = ["logdelivery.elb.amazonaws.com"]
}
actions = [
"s3:PutObject"
]
resources = [
"${aws_s3_bucket.lb-logs.arn}/*"
]
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = [
"bucket-owner-full-control"
]
}
}
}

How do I capture AWS Backup failures in terraform when Windows VSS fails?

I'm using AWS Backups to back up several EC2 instances. I have terraform that seems to report correctly when there is a backup failure, but I am also interested in when the disks have backed up correctly, but when Windows VSS fails. Ultimately, the failed events are going to be published to Opsgenie. Is there a way to accomplish this? I have tried capturing all events with the 'aws_backup_vault_notifications' resource, and I have tried a filter as described in this AWS blog: https://aws.amazon.com/premiumsupport/knowledge-center/aws-backup-failed-job-notification/
I have included most of my terraform below, minus the opsgenie module; I can get successful or fully failing events published to Opsgenie just fine if I include those events:
locals {
backup_vault_events = toset(["BACKUP_JOB_FAILED", "COPY_JOB_FAILED"])
}
resource "aws_backup_region_settings" "legacy" {
resource_type_opt_in_preference = {
"Aurora" = false
"DynamoDB" = false
"EFS" = false
"FSx" = false
"RDS" = false
"Storage Gateway" = false
"EBS" = true
"EC2" = true
"DocumentDB" = false
"Neptune" = false
"VirtualMachine" = false
}
}
resource "aws_backup_vault" "legacy" {
name = "Legacy${var.environment_tag}"
kms_key_arn = aws_kms_key.key.arn
}
resource "aws_iam_role" "legacy_backup" {
name = "AWSBackupService"
permissions_boundary = data.aws_iam_policy.role_permissions_boundary.arn
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["sts:AssumeRole"],
"Effect": "allow",
"Principal": {
"Service": ["backup.amazonaws.com"]
}
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "legacy_backup" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
role = aws_iam_role.legacy_backup.name
}
###############################################################################
## Second Region Backup
###############################################################################
resource "aws_backup_vault" "secondary" {
provider = aws.secondary
name = "Legacy${var.environment_tag}SecondaryRegion"
kms_key_arn = aws_kms_replica_key.secondary_region.arn
tags = merge(
local.tags, {
name = "Legacy${var.environment_tag}SecondaryRegion"
}
)
}
data "aws_iam_policy_document" "backups" {
policy_id = "__default_policy_ID"
statement {
actions = [
"SNS:Publish",
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["backup.amazonaws.com"]
}
resources = [
aws_sns_topic.backup_alerts.arn
]
sid = "__default_statement_ID"
}
}
###############################################################################
# SNS
###############################################################################
resource "aws_sns_topic_policy" "backup_alerts" {
arn = aws_sns_topic.backup_alerts.arn
policy = data.aws_iam_policy_document.backups.json
}
resource "aws_backup_vault_notifications" "backup_alerts" {
backup_vault_name = aws_backup_vault.legacy.id
sns_topic_arn = aws_sns_topic.backup_alerts.arn
backup_vault_events = local.backup_vault_events
}
resource "aws_sns_topic_subscription" "backup_alerts_opsgenie_target" {
topic_arn = aws_sns_topic.backup_alerts.arn
protocol = "https"
endpoint = module.opsgenie_team.sns_integration_sns_endpoint
confirmation_timeout_in_minutes = 1
endpoint_auto_confirms = true
}

creating multiple SQS from main.tf configuration

I have just updated my question to include my terragrunt.hcl that will call my main.tf to create the resources in different environment.I dont know how to replace the resources section of the policy that has ${aws_sqs_queue.Trail_SQS.arn} because I need to have different names in their based on the environment I am working in and also I don't know how to represent the redrive_policy in my terragrunt.hcl.please guys i need some help.thanks inadvance
Main.tf
resource "aws_sqs_queue" "Trail_SQS"{
name = var.aws_sqs
visibility_timeout_seconds = var.visibility_timeout_seconds
max_message_size = var.max_message_size
message_retention_seconds = var.message_retention_seconds
delay_seconds = var.delay_seconds
receive_wait_time_seconds = var.receive_wait_time_seconds
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.Trail_SQS_DLQ.arn
maxReceiveCount = var.max_receive_count
})
}
resource "aws_sqs_queue" "Trail_SQS_DLQ"{
name = var.dead_letter_queue
visibility_timeout_seconds = var.visibility_timeout_seconds
max_message_size = var.max_message_size
message_retention_seconds = var.message_retention_seconds
delay_seconds = var.delay_seconds
receive_wait_time_seconds = var.receive_wait_time_seconds
}
resource "aws_iam_role" "ronix_access_role" {
name = var.role_name
description = var.description
assume_role_policy = data.aws_iam_policy_document.trust_relationship.json
}
data "aws_iam_policy_document" "ronix_policy_document"{
statement{
actions = [
"sqs:DeleteMessage",
"sqs:GetQueueUrl",
"sqs:ReceiveMessage",
"sqs:SendMessage",
"sqs:SetQueueAttributes"
]
effect = "Allow"
resources =[
"${aws_sqs_queue.Trail_SQS.arn}"
] }
resource "aws_iam_policy" "ronix_policy" {
name = "ronix_access_policy"
description = "ronix policy to access SQS"
policy = data.aws_iam_policy_document.securonix_policy_document.json
resource "aws_iam_role_policy_attachment" "ronix_policy_attachment" {
policy_arn = aws_iam_policy.ronix_policy.arn
role = aws_iam_role.ronix_access_role.id
}
resource "aws_sqs_queue_policy" "trail_SQS_Policy" {
queue_url = aws_sqs_queue.Trail_SQS.id
policy = <<POLICY
{ "Version": "2012-10-17",
"Id": "sqspolicy",
"Statement": [
{
"Sid": "AllowSQSInvocation",
"Effect": "Allow",
"Principal": {"AWS":"*"},
"Action": "sqs:*",
"Resource": "${aws_sqs_queue.Trail_SQS.arn}"
Terragrunt.hcl to call main.tf
terraform {
source = "../../../../..//module"
}
include {
path = find_in_parent_folders()
}
inputs = {
event_log_bucket_name = "trailbucket-sqs-logs"
aws_sqs_queue_name = "Trail_SQS"
dead_letter_queue_name = "Trail_SQS_DLQ"
role_name = "ronix_access_role"
description = "Role for ronix access"
kms_key_arn = "ARN of the key"
}
}
I don't know your setup but there are a few ways to do it.
1 - Using workspaces.
If you are using workspaces in terraform and let's say you have dev and prod as workspace, you can simply do it:
locals.tf:
locals {
env = terraform.workspace
}
sqs.tf:
resource "aws_sqs_queue" "my_sqs" {
name = "${local.env}-sqs"
...
}
It will create two sqs: dev-sqs and prod-sqs depending on each workspace you are in.
2 - If you are using environment variables in your setup, you need to send it to terraform like:
export TF_VAR_ENV=prod
Then your setup will be something like:
variables.tf:
variable "ENV" {
type = string
}
sqs.tf
resource "aws_sqs_queue" "my_sqs" {
name = "${var.ENV}-sqs"
...
}

Cloudwatch Custom Events SQS fails to work

I am using terraform to create the queues while also creating the Cloudwatch event rules and setting one of the queues as the target for the rules.
In summary I have a single queue that is the target for 3 separate cloudwatch events. The problem is that even though the cloudwatch event rules are identical, only one of them work when created through terraform, the others end up as failed invocations in the console with no log or any sort of debuggable information. If the custom events are created from the aws console, all works well.
The creation of the queue in terraform
resource "aws_sqs_queue" "queue_cron" {
name = "cron"
visibility_timeout_seconds = 300 # 5 minutes
delay_seconds = 0
message_retention_seconds = 1800 # 30 minutes
receive_wait_time_seconds = 20
}
The only working block
resource "aws_cloudwatch_event_rule" "eve_vendors_bot_sync" {
name = "vendors-bot-sync"
schedule_expression = "rate(1 minute)"
description = "Notify cron queue for vendors bot sync"
is_enabled = true
}
resource "aws_cloudwatch_event_target" "sqs_cron_vendors_bot_sync" {
rule = aws_cloudwatch_event_rule.eve_vendors_bot_sync.name
arn = var.queue_cron_arn
target_id = "sqsCronVendorBotSync"
input_transformer {
input_template = <<EOF
{
"messageType":"cron",
"cronType":"vendors-bot-sync"
}
EOF
}
}
Doesn't work even though it's identical in structure to the one above.
resource "aws_cloudwatch_event_rule" "eve_restos_sync" {
name = "restos-sync"
schedule_expression = "rate(1 minute)"
description = "Notify cron queue for restos sync"
is_enabled = true
}
resource "aws_cloudwatch_event_target" "sqs_cron_restos_sync" {
rule = aws_cloudwatch_event_rule.eve_restos_sync.name
arn = var.queue_cron_arn
target_id = "sqsCronRestosSync"
input_transformer {
input_template = <<EOF
{
"messageType":"cron",
"cronType":"restaurant-hours-open-close-management"
}
EOF
}
}
Similar to the one above, does not work
resource "aws_cloudwatch_event_rule" "eve_vendors_orders_sync" {
name = "vendors-orders-sync"
schedule_expression = "rate(1 minute)"
description = "Notify cron queue for vendors orders sync"
is_enabled = true
}
resource "aws_cloudwatch_event_target" "target_cron_vendors_sync" {
rule = aws_cloudwatch_event_rule.eve_vendors_orders_sync.name
arn = var.queue_cron_arn
target_id = "sqsCronVendorsOrderSync"
input_transformer {
input_template = <<EOF
{
"messageType":"cron",
"cronType":"vendors-orders-sync"
}
EOF
}
}
Answer
The missing piece in the puzzle as rightfully pointed out by #Marchin was indeed the policy that was preventing cloudwatch from sending a message to SQS.
Here is the updated config that got it working.
Create the queue
Create a policy that will allow cloudwatch the ability to sendMessages to the queue
Attach the policy to the queue
resource "aws_sqs_queue" "queue_cron" {
name = "cron"
visibility_timeout_seconds = 300 # 5 minutes
delay_seconds = 0
message_retention_seconds = 1800 # 30 minutes
receive_wait_time_seconds = 20
}
data "aws_iam_policy_document" "policy_sqs" {
statement {
sid = "AWSEvents_"
effect = "Allow"
actions = [
"sqs:SendMessage",
]
principals {
type = "Service"
identifiers = ["events.amazonaws.com"]
}
resources = [aws_sqs_queue.queue_cron.arn]
}
}
resource "aws_sqs_queue_policy" "cron_sqs_policy" {
queue_url = aws_sqs_queue.queue_cron.id
policy = data.aws_iam_policy_document.policy_sqs.json
}
I think your permissions on SQS queue are missing or are incorrect. Assuming that you are creating your queue_cron in terraform (not shown in the question), the queue and its policy allowing CW Events sending messages to it would be:
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
resource "aws_sqs_queue" "queue_cron" {
name = "queue_cron"
}
resource "aws_sqs_queue_policy" "test" {
queue_url = aws_sqs_queue.queue_cron.id
policy = <<POLICY
{
"Version": "2012-10-17",
"Id": "sqspolicy",
"Statement": [
{
"Sid": "First",
"Effect": "Allow",
"Principal": {
"AWS": "${data.aws_caller_identity.current.account_id}"
},
"Action": "sqs:*",
"Resource": "${aws_sqs_queue.queue_cron.arn}"
},
{
"Sid": "AWSEvents_",
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "sqs:SendMessage",
"Resource": "${aws_sqs_queue.queue_cron.arn}",
"Condition": {
"ArnEquals": {
"aws:SourceArn": "arn:aws:events:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:rule/*"
}
}
}
]
}
POLICY
}