Value for Terraform Composer airflow_config_override secrets-backend_kwargs - google-cloud-platform

I need to change, using Terraform, the default project_id in my Composer environment so that I can access secrets from another project. To do so, according to Terraform, I need the variable airflow_config_overrides. I guess I should have something like this:
resource "google_composer_environment" "test" {
# ...
config {
software_config {
airflow_config_overrides = {
secrets-backend = "airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend",
secrets-backend_kwargs = {"project_id":"9999999999999"}
}
}
}
}
The secrets-backend section-key seems to be working. On the other hand, secrets-backend_kwargs is returning the following error:
Inappropriate value for attribute "airflow_config_overrides": element "secrets-backend_kwargs": string required
It seems that the problem is that GCP expects a JSON format and Terraform requires a string. How can I get Terraform to provide it in the format needed?

You can convert a map such as {"project_id":"9999999999999"} into a JSON encoded string by using the jsonencode function.
So merging the example given in the google_composer_environment resource documentation with your config in the question you can do something like this:
resource "google_composer_environment" "test" {
name = "mycomposer"
region = "us-central1"
config {
software_config {
airflow_config_overrides = {
secrets-backend = "airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend",
secrets-backend_kwargs = jsonencode({"project_id":"9999999999999"})
}
pypi_packages = {
numpy = ""
scipy = "==1.1.0"
}
env_variables = {
FOO = "bar"
}
}
}
}

Related

Terraform missing when value should be passed in from tfvar file

I have a terraform project which contains a main module and a submodule. Something like this:
\modules
dev.tfvars
\app
\main
main.tf, output.tf, providers.tf, variables.tf
\sub-modules
\eventbridge
main.tf, output.tf, variables.tf
Both variables.tf files have this variable defined:
variable "secrets" {
description = "map for secret manager"
type = map(string)
}
The top level main.tf has this defined:
module "a-eventbridge-trigger" {
source = "../sub-modules/eventbridge"
secrets = var.secrets
}
The submodule main.tf has this:
resource "aws_cloudwatch_event_connection" "auth" {
name = "request-token"
description = "Gets token"
authorization_type = "OAUTH_CLIENT_CREDENTIALS"
auth_parameters {
oauth {
authorization_endpoint = "${var.apiurl}"
http_method = "POST"
oauth_http_parameters {
body {
key = "grant_type"
value = "client_credentials"
is_value_secret = true
}
body {
key = "client_id"
value = var.secrets.Client_Id
is_value_secret = true
}
body {
key = "client_secret"
value = var.secrets.Client_Secret
is_value_secret = true
}
}
}
}
}
However, when run it throws this error:
Error: error creating EventBridge connection (request-token): InvalidParameter: 2 validation error(s) found.
- missing required field, CreateConnectionInput.AuthParameters.OAuthParameters.ClientParameters.ClientID.
- missing required field, CreateConnectionInput.AuthParameters.OAuthParameters.ClientParameters.ClientSecret.
A file dump ahead of the terrform apply command successfully dumps out the contents of the tfvars file, so I know it exists at time of execution.
The top level output.tf successfully writes out the complete values of the secrets variable after execution, so I know the top level module receives the variables.
In the submodule, the resources defined after the aws_cloudwatch_event_connection block do get created and they also use variables received from the same tfvars file.
Is this a problem with how I am providing the variables or with my definition of the resources itself? (Or something else?)
client_parameters is missing on your configuration, you need to set it in auth_parameters.oauth
resource "aws_cloudwatch_event_connection" "auth" {
name = "request-token"
description = "Gets token"
authorization_type = "OAUTH_CLIENT_CREDENTIALS"
auth_parameters {
oauth {
authorization_endpoint = "${var.apiurl}"
http_method = "POST"
client_parameters {
client_id = var.secrets.Client_Id
client_secret = var.secrets.Client_Secret
}
oauth_http_parameters {
body {
key = "grant_type"
value = "client_credentials"
is_value_secret = true
}
}
}
}
}

Using Terraform, how can I pass in a list of users from module to module

I am trying to use Terraform's public modules and I created a lot of IAM users like so:
module "iam_finance_users" {
source = "terraform-aws-modules/iam/aws//modules/iam-user"
version = "5.9.1"
for_each = var.finance_users
name = each.value
force_destroy = true
create_iam_access_key = false
password_reset_required = true
}
My variables.tf file has the following:
variable "finance_users" {
type = set(string)
default = ["mary.johnson#domain.com","mike.smith#domain.com"....]
}
Now I am trying to add these users to a group like so
module "iam_group_finance" {
source = "terraform-aws-modules/iam/aws//modules/iam-group-with-policies"
version = "5.9.1"
name = "finance"
group_users = ["${module.iam_finance_users.iam_user_name}"]
attach_iam_self_management_policy = true
custom_group_policy_arns = [
"${module.iam_read_write_policy.arn}",
]
}
But no matter what I try I still keep getting errors. I have a list of users in a variable and I want to add all those users created in that module to the group module. I know I'm close but I can't quite seem to close this.
Since you used for_each in the module, you have to use values to access elements of all instances of the module created:
group_users = values(module.iam_finance_users)[*].iam_user_name
It's often helpful to know what the error is. In versions of Terraform v1.0 or newer, the errors become fairly explanatory. The error you were likely getting was something similar to:
│ Error: Unsupported attribute
│
│ on main.tf line 14, in output "iam_finance_users":
│ 14: value = module.iam_finance_users.iam_user_name
│ ├────────────────
│ │ module.iam_finance_users is object with 2 attributes
│
│ This object does not have an attribute named "iam_user_name".
The solutions, which both work, can be tested using the following example code:
Module 1: modules/iam-user
variable "name" {
type = string
}
output "iam_user_name" {
value = "IAM-${var.name}"
}
Module 2: modules/iam-group-with-policies
variable "group_users" {
type = set(string)
}
output "iam-users" {
value = var.group_users
}
Root module (your code):
variable "finance_users" {
type = set(string)
default = ["mary.johnson#domain.com", "mike.smith#domain.com"]
}
module "iam_finance_users" {
source = "../modules/iam-user"
for_each = var.finance_users
name = each.value
}
module "iam_group_finance" {
source = "../modules/iam-group-with-policies"
# This works...
group_users = values(module.iam_finance_users)[*].iam_user_name
# This also works...
group_users = [for o in module.iam_finance_users : o.iam_user_name]
}
output "group_users" {
value = module.iam_group_finance.iam-users
}
You can easily test each module along the way using terraform console.
E.g.:
terraform console
> module.iam_finance_users
{
"mary.johnson#domain.com" = {
"iam_user_name" = "IAM-mary.johnson#domain.com"
}
"mike.smith#domain.com" = {
"iam_user_name" = "IAM-mike.smith#domain.com"
}
}
Here you can see why it didn't work. The module doesn't spit out a list in that way, so you have to iterate through the module to get your variable. This is why both the for method as well as the values method works.
Look at how differently a single module is handled:
module "iam_finance_users" {
source = "../modules/iam-user"
name = tolist(var.finance_users)[1]
}
terraform console
> module.iam_finance_users
{
"iam_user_name" = "IAM-mike.smith#domain.com"
}
Notice that when not inside a loop (foreach), that the single iam_user_name is accessible.
You could try something like below:
This is untested, so you might need to tweak a bit. Reference thread: https://discuss.hashicorp.com/t/for-each-objects-to-list/36609/2
module "iam_group_finance" {
source = "terraform-aws-modules/iam/aws//modules/iam-group-with-policies"
version = "5.9.1"
name = "finance"
group_users = [ for o in module.iam_finance_users : o.iam_user_name ]
attach_iam_self_management_policy = true
custom_group_policy_arns = [
module.iam_read_write_policy.arn
]
}

How to skip declaring values in root module (for_each loop)

I am trying to build a reusable module that creates multiple S3 buckets. Based on a condition, some buckets may have lifecycle rules, others do not. I am using a for loop in the lifecycle rule resource and managed to do it but not on 100%.
My var:
variable "bucket_details" {
type = map(object({
bucket_name = string
enable_lifecycle = bool
glacier_ir_days = number
glacier_days = number
}))
}
How I go through the map on the lifecycle resource:
resource "aws_s3_bucket_lifecycle_configuration" "compliant_s3_bucket_lifecycle_rule" {
for_each = { for bucket, values in var.bucket_details : bucket => values if values.enable_lifecycle }
depends_on = [aws_s3_bucket_versioning.compliant_s3_bucket_versioning]
bucket = aws_s3_bucket.compliant_s3_bucket[each.key].bucket
rule {
id = "basic_config"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
transition {
days = each.value["glacier_ir_days"]
storage_class = "GLACIER_IR"
}
transition {
days = each.value["glacier_days"]
storage_class = "GLACIER"
}
expiration {
days = 2555
}
noncurrent_version_transition {
noncurrent_days = each.value["glacier_ir_days"]
storage_class = "GLACIER_IR"
}
noncurrent_version_transition {
noncurrent_days = each.value["glacier_days"]
storage_class = "GLACIER"
}
noncurrent_version_expiration {
noncurrent_days = 2555
}
}
}
How I WOULD love to reference it in the root module:
module "s3_buckets" {
source = "./modules/aws-s3-compliance"
#
bucket_details = {
"fisrtbucketname" = {
bucket_name = "onlythefisrtbuckettesting"
enable_lifecycle = true
glacier_ir_days = 555
glacier_days = 888
}
"secondbuckdetname" = {
bucket_name = "onlythesecondbuckettesting"
enable_lifecycle = false
}
}
}
So when I reference it like that, it cannot validate, because I am not setting values for both glacier_ir_days & glacier_days - understandable.
My question is - is there a way to check if the enable_lifecycle is set to false, to not expect values for these?
Currently, as a workaround, I am just setting zeroes for those and since the resource is not created if enable_lifecycle is false, it does not matter, but I would love it to be cleaner.
Thank you in advance.
The forthcoming Terraform v1.3 release will include a new feature for declaring optional attributes in an object type constraint, with the option of declaring a default value to use when the attribute isn't set.
At the time I'm writing this the v1.3 release is still under development and so not available for general use, but I'm going to answer this with an example that should work with Terraform v1.3 once it's released. If you wish to try it in the meantime you can experiment with the most recent v1.3 alpha release which includes this feature, though of course I would not recommend using it in production until it's in a final release.
It seems that your glacier_ir_days and glacier_days attributes are, from a modeling perspective, attribtues that are required when the lifecycle is enabled and not required when lifecycle is disabled.
I would suggest modelling that by placing these attributes in a nested object called lifecycle and implementing it such that the lifecycle resource is enabled when that attribute is set, and disabled when it is left unset.
The declaration would therefore look like this:
variable "s3_buckets" {
type = map(object({
bucket_name = string
lifecycle = optional(object({
glacier_ir_days = number
glacier_days = number
}))
}))
}
When an attribute is marked as optional(...) like this, Terraform will allow omitting it in the calling module block and then will quietly set the attribute to null when it performs the type conversion to make the given value match the type constraint. This particular declaration doesn't have a default value, but it's also possible to pass a second argument in the optional(...) syntax which Terraform will then use instead of null as the placeholder value when the attribute isn't specified.
The calling module block would therefore look like this:
module "s3_buckets" {
source = "./modules/aws-s3-compliance"
#
bucket_details = {
"fisrtbucketname" = {
bucket_name = "onlythefisrtbuckettesting"
lifecycle = {
glacier_ir_days = 555
glacier_days = 888
}
}
"secondbuckdetname" = {
bucket_name = "onlythesecondbuckettesting"
}
}
}
Your resource block inside the module will remain similar to what you showed, but the if clause of the for expression will test if the lifecycle object is non-null instead:
resource "aws_s3_bucket_lifecycle_configuration" "compliant_s3_bucket_lifecycle_rule" {
for_each = {
for bucket, values in var.bucket_details : bucket => values
if values.lifecycle != null
}
# ...
}
Finally, the references to the attributes would be slightly different to traverse through the lifecycle object:
transition {
days = each.value.lifecycle.glacier_days
storage_class = "GLACIER"
}

Invalid Schema error in AWS Glue created via Terraform

I have a Kinesis Firehose configuration in Terraform, which reads data from Kinesis stream in JSON, converts it to Parquet using Glue and writes to S3.
There is something wrong with data format conversion and I am getting the below error(with some details removed):
{"attemptsMade":1,"arrivalTimestamp":1624541721545,"lastErrorCode":"DataFormatConversion.InvalidSchema","lastErrorMessage":"The
schema is invalid. The specified table has no columns.","attemptEndingTimestamp":1624542026951,"rawData":"xx","sequenceNumber":"xx","subSequenceNumber":null,"dataCatalogTable":{"catalogId":null,"databaseName":"db_name","tableName":"table_name","region":null,"versionId":"LATEST","roleArn":"xx"}}
The Terraform configuration for Glue Table, I am using, is as follows:
resource "aws_glue_catalog_table" "stream_format_conversion_table" {
name = "${var.resource_prefix}-parquet-conversion-table"
database_name = aws_glue_catalog_database.stream_format_conversion_db.name
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
"parquet.compression" = "SNAPPY"
}
storage_descriptor {
location = "s3://${element(split(":", var.bucket_arn), 5)}/"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "my-stream"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
parameters = {
"serialization.format" = 1
}
}
columns {
name = "metadata"
type = "struct<tenantId:string,env:string,eventType:string,eventTimeStamp:timestamp>"
}
columns {
name = "eventpayload"
type = "struct<operation:string,timestamp:timestamp,user_name:string,user_id:int,user_email:string,batch_id:string,initiator_id:string,initiator_email:string,payload:string>"
}
}
}
What needs to change here?
I faced the "The schema is invalid. The specified table has no columns" with the following combination:
avro schema in Glue schema registry,
glue table created through console using "Add table from existing schema"
kinesis data firehose configured with Parquet conversion and referencing the glue table created from the schema registry.
It turns out that KDF is unable to read table's schema if table is created from existing schema. Table have to be created from scratch (in opposition to "Add table from existing schema") This isn't documented ... for now.
In addition to the answer from mberchon I found that the default generated policy for the Kinesis Delivery Stream did not include the necessary IAM permissions to actually read the schema.
I had to manually modify the IAM policy to include glue:GetSchema and glue:GetSchemaVersion.
Frustrated by having to manually define columns, wrote a little python tool that takes a pydantic class (could be made to work with json-schema too) and generated a json that can be used with terraform to create the table.
https://github.com/nanit/j2g
from pydantic import BaseModel
from typing import List
class Bar(BaseModel):
name: str
age: int
class Foo(BaseModel):
nums: List[int]
bars: List[Bar]
other: str
get converted to
{
"nums": "array<int>",
"bars": "array<struct<name:string,age:int>>",
"other": "string"
}
and can be used in terraform like so
locals {
columns = jsondecode(file("${path.module}/glue_schema.json"))
}
resource "aws_glue_catalog_table" "table" {
name = "table_name"
database_name = "db_name"
storage_descriptor {
dynamic "columns" {
for_each = local.columns
content {
name = columns.key
type = columns.value
}
}
}
}
Thought id post here as i was facing the same problem and found a workaround for this that appears to work.
As is stated above AWS do not allow you to use tables generated from existing schema to convert data types using Firehose. That said if you are using terraform you can create the table using the existing schema, then use the columns attribute from the first table created to create another table and then use that second table as the table for data type conversion in the firehose config, i can confirm this works.
tables terraform:
resource "aws_glue_catalog_table" "aws_glue_catalog_table_from_schema" {
name = "first_table"
database_name = "foo"
storage_descriptor {
schema_reference {
schema_id {
schema_arn = aws_glue_schema.your_glue_schema.arn
}
schema_version_number = aws_glue_schema.your_glue_schema.latest_schema_version
}
}
}
resource "aws_glue_catalog_table" "aws_glue_catalog_table_from_first_table" {
name = "second_table"
database_name = "foo"
storage_descriptor {
dynamic "columns" {
for_each = aws_glue_catalog_table.aws_glue_catalog_table_from_schema.storage_descriptor[0].columns
content {
name = columns.value.name
type = columns.value.type
}
}
}
}
firehose data format conversion configuration:
data_format_conversion_configuration {
output_format_configuration{
serializer {
parquet_ser_de {}
}
}
input_format_configuration {
deserializer {
hive_json_ser_de {}
}
}
schema_configuration {
database_name = aws_glue_catalog_table.aws_glue_catalog_table_from_first_table.database_name
role_arn = aws_iam_role.firehose_role.arn
table_name = aws_glue_catalog_table.aws_glue_catalog_table_from_first_table.name
}
}

Interpreting aws secrets in Terraform

I have the following code..
data "aws_secretsmanager_secret" "db_password" {
name = "${var.db_secret}"
}
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "${data.aws_secretsmanager_secret.db_password.id}"
}
master_password = "${data.aws_secretsmanager_secret_version.db_password.secret_string}"
which returns the secret_string in this case of
secret_string = {"Test":"TestPassword"}
how do i cut out and use the TestPassword section of the secret for use as my master_password?
I had to fake up your Secrets endpoint but this test endpoint returns the same json:
So in tf...
data "external" "secret_string" {
program = ["curl", "http://echo.jsontest.com/Test/Testpassword"]
}
output "json_data_key" {
value = "${data.external.secret_string.result}"
}
output "PASSWORD" {
value = "${lookup(data.external.secret_string.result, "Test")}"
}
that last output is what you were after?
${lookup(data.external.secret_string.result, "Test")}
Which gives you:
data.external.secret_string: Refreshing state...
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
PASSWORD = Testpassword
json_data_key = {
Test = Testpassword
}
So it is certainly possible to parse json before 0.12......
Considering this is JSON, you probably need to wait for jsondecode in Terraform v0.12 to solve the problem.
jsondecode function Github issue