Glue Virtual View (terraform created) not appearing in Athena - amazon-web-services

I am trying to use Terraform aws_glue_catalog_table to create a Virtual_View , which I understand, should appear in Athena as a View.
So far my code seems to create a catalog table in Glue, but nothing appears in the Athena Views inventory.
It is hard to know exactly which part is the issue.
I have tried comparing the generated glue table to an existing one, manually created but of the same specification, but no differences appear in the info shown - but the 'originaltext' part is hard to compare being encoded.
Have tried removing the ser_de_info section, but doesn't seem to make any difference.
Grateful for any hints here !
Really not sure why TF doesn't just allow us to submit a simple SQL DDL statement to create these, as this glue method is just too convoluted to make practical sense - declaring columns twice in 2 different formats , encoding script - both just bad
resource "aws_glue_catalog_table" "aws_gluetable_getresources_vw" {
name = "getresources_vw"
database_name = "mydatabase"
table_type = "VIRTUAL _VIEW"
view_original_text = "/* Presto View: ${base64encode(file("${path.module}/originaltexts/getresources.txt"))} */"
view_expanded_text = "/* Presto View */"
parameters = {
presto_view = "true"
comment = "Presto View"
}
storage_descriptor {
ser_de_info {
name = " "
serialization_library = " "
}
columns {
name = "key"
type = "string"
}
columns {
name = "value"
type = "string"
}
columns {
name = "resourcearn"
type = "string"
}
columns {
name = "tags"
type = "array<struct<key:string,value:string>>"
}
.... more
}
}
}
getresources.txt
{
"catalog":"awsdatacatalog",
"schema":"mydatabase",
"columns":[
{"name":"key","type":"varchar"},
{"name":"value","type":"varchar"},
{"name":"resourcearn","type":"varchar"},
{"name":"tags","type":"array(row(key varchar,value varchar))"},
{"name":"arn1","type":"varchar"},
{"name":"arn2","type":"varchar"},
{"name":"arn3","type":"varchar"},
{"name":"arn4","type":"varchar"}
],
"originalSql":"SELECT g.tag.key, g.tag.value, t.resource.resourcearn, t.resource.tags, split_part(t.resource.resourcearn, ':', 1) arn1, split_part(t.resource.resourcearn, ':', 2) arn2, split_part(t.resource.resourcearn, ':', 3) arn3, split_part(t.resource.resourcearn, ':', 6) arn4 FROM ((ap_ath_meta_use_sbx.getresources h CROSS JOIN UNNEST(h.resourcetagmappinglist) t (resource)) CROSS JOIN UNNEST(t.resource.tags) g (tag))"
}

Creating an Athena-compatible view using the Glue APIs is difficult. I don't know how the Terraform provider does it, but I assume it's missing one of the many details that are necessary to get right. I wrote together an unofficial documentation in this answer: https://stackoverflow.com/a/56347331/1109

Related

Terraform create resources seperately for each item in list (Cloudwatch Dashboard: Create row of 3 different metric widgets for each Lambda in list?)

I am fairly new to Terraform, being much more familiar with Cloudformation, but I am enjoying the ease with which I can create multiple resources from lists of vars.
Any help really appreciated.
TASK:
Create CW dashboard to monitor a list of Lambda functions for error rate, throttle rate and duration anomaly.
I have defined the widget for each of these metrics in TF.
I would like the dashboard to show these three widgets in a row, along with a markdown text section, for each lambda:
Desired outcome:
Lambda_Dashboard
MARKDOWN_TEXT_BLOCK: LAMBDA_1
LAMBDA_1_error_rate_widget - LAMBDA_1_throttle_rate_widget - LAMBDA_1_duration_anomaly_widget
MARKDOWN_TEXT_BLOCK: LAMBDA_2
LAMBDA_2_error_rate_widget - LAMBDA_2_throttle_rate_widget - LAMBDA_2_duration_anomaly_widget
# and so on...
I have a list of Lambda functions and each widget defined in TF, example:
error_rate_widget = {
type = "metric"
width = 8
height = 6
y = 1,
x = 0,
properties = {
...
}
}
PROBLEM:
I have only been able to create blocks of the same widgets for each lambda.
This link (https://github.com/silinternational/terraform-aws-ecs-service-cloudwatch-dashboard/blob/develop/main.tf) shows what I mean:
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = var.dashboard_name
dashboard_body = jsonencode({
widgets = local.widgets
})
}
locals {
widgets = [for service_name in var.service_names : {
type = "metric"
width = 18
height = 6
properties = {
...
}]
}
This syntax only results in resources grouped by metric widget (i.e. all the error_rate_widgets for every Lambda grouped together, same for the throttle_rate_widgets, and so on) - and then only for one lambda.
Has anyone done this before or can anyone shed some light on how I could do this please? Code examples would be really helpful due to my inexperience here.
Thank you!
ATTEMPTED SOLUTIONS:
I thought I might be able to create seperate lists of all the metric widget resources like this, but haven't able to get it to work/am not sure of the syntax (please excuse pseudo code):
error_widgets = [all_lambdas]
throttle_widgets = [all_lambdas]
duration_anomaly_widgets = [all_lambdas]
# IN DASHBOARD RESOURCE :
for i in count(lambdas):
Text: title {lambdas[i]}
error_widget[i]
throttle_widgets[i]
duration_anomaly_widgets[i]
Or maybe some ugly nested for loops (again, pseudocode)?
dashboard_body = {
widgets = [for lambda_function in var.lambda_functions : [
for widget in local.widgets : {
lambda_function = lambda_function
}
]
]}

Terraform : for_each one by one

I have created a module on terraform, this module creates aws_servicecatalog_provisioned_product resources.
When I call this module from the root I am using for_each to run into a list of objects.
The module runs into this list of objects and creates the aws_servicecatalog_provisioned_product resources in parallel.
Is there a way to create the resources one by one? I want that the module will wait for the first iteration to be done and to create the next just after.
Is there a way to create the resources one by one?
Sadly, there is not such way, unless you remove for_each and create all the modules separately with depends_on.
TF is not a procedural language, and it always will do things in parallel for for_each and count.
I am using terraform templatefile that creates resources with a depends on order, and then terraform creates resources one by one.
Here is the code:
locals {
expanded_accounts = [
{
AccountEmail = example1#example.com
AccountName = example1
ManagedOrganizationalUnit = example_ou1
SSOUserEmail = example1#example.com
SSOUserFirstName = Daniel
SSOUserLastName = Wor
ou_id = ou_id1
},
{
AccountEmail = example2#example.com
AccountName = example2
ManagedOrganizationalUnit = example_ou2
SSOUserEmail = example2#example.com
SSOUserFirstName = Ben
SSOUserLastName = John
ou_id = ou_id2
}
]
previous_resource = [
for acc in local.expanded_accounts :
acc.AccountName
]
resources = { res = local.expanded_accounts, previous = concat([""], local.previous_resource)
}
resource "local_file" "this" {
content = templatefile("./provisioned_accounts.tpl", local.resources)
filename = "./generated_provisioned_accounts.tf"
directory_permission = "0777"
file_permission = "0777"
lifecycle {
ignore_changes = [directory_permission, file_permission, filename]
}
}
provisioned_accounts.tpl configuration:
%{ for acc in res }
resource "aws_servicecatalog_provisioned_product" "${acc.AccountName}" {
name = "${acc.AccountName}"
product_id = replace(data.local_file.product_name.content, "\n", "")
provisioning_artifact_id = replace(data.local_file.pa_name.content, "\n", "")
provisioning_parameters {
key = "SSOUserEmail"
value = "${acc.SSOUserEmail}"
}
provisioning_parameters {
key = "AccountEmail"
value = "${acc.AccountEmail}"
}
provisioning_parameters {
key = "AccountName"
value = "${acc.AccountName}"
}
provisioning_parameters {
key = "ManagedOrganizationalUnit"
value = "${acc.ManagedOrganizationalUnit} (${acc.ou_id})"
}
provisioning_parameters {
key = "SSOUserLastName"
value = "${acc.SSOUserLastName}"
}
provisioning_parameters {
key = "SSOUserFirstName"
value = "${acc.SSOUserFirstName}"
}
timeouts {
create = "60m"
}
%{if index != 0 }
depends_on = [aws_servicecatalog_provisioned_product.${previous[index]}]
%{ endif }
}
%{~ endfor ~}
Why do you want it to wait for the previous creation? Terraform relies on the provider to know what can happen in parallel and will run in parallel where it can.
Setting the parallelism before the apply operation would be how I would limit it artificiality if I wanted to as it's an technical workaround that keeps your Terraform code simple to read.
TF_CLI_ARGS_apply="-parallelism=1"
terraform apply
If you find this is slowing down all Terraform creations but you need this particular set of resources to be deployed one at a time then it might be time to break these particular resources out into their own Terraform config directory and apply it in a different step to the rest of the resources again with the parallelism setting.
You have to remove the for_each and use depends_on for every element if you want to make sure that they are created one after another.
If you want only the first resource to be provisioned before other resources:
Separate the first resource only and use the for_each for the remaining resources. You can put an explicit dependency using depends_on for the remaining resources to depend on the first one. Because for_each expects a set or a map, this input would require some modification to be able to exclude the provisioning of the first resource.
A more drastic approach, if you really need to provision resources one by one, would be to run the apply command with -parallelism=1. This would reduce the number of resources provisioned in parallel to 1. This would apply to the whole project. I would not recommend this, since it would increase drastically the running time for the apply.

Unable to create dynamic terraform outputs for use in terraform_remote_state

I have the following code block for creating various IAM groups
resource "aws_iam_group" "environment-access" {
count = "${length(var.environments)}"
name = "access-${element(var.environments, count.index)}"
}
variable "environments" {
default = ["production", "non-production"]
type = "list"
}
I want to write the outputs of the IAM groups created in order to grab the ARN of each group to use as data via terraform_remote_state where it would look something like the following
Outputs:
access-production = arn:aws:iam::XXXXXXX:group/basepath/access-production
access-non-production = arn:aws:iam::XXXXXXX:group/basepath/access-non-production
I am having trouble creating the dynamic outputs as I am unsure how to dynamically create the output stanzas based on the the resource originally created as using the below code yields an error referencing unknown resource 'aws_iam_group.access-production' referenced.
output "access-production" {
value = "${aws_iam_group.access-production.arn}"
}
output "access-non-production" {
value = "${aws_iam_group.access-non-production.arn}"
}
An initial problem with this requirement is that it calls for having a single dynamic list of environments but multiple separate output values. In order to make this work, you'll need to either make the environment inputs separate values or produce a single output value describing the environments.
# Variant with a fixed set of environments (v0.11 syntax)
variable "production_environment_name" {
type = "string"
default = "production"
}
variable "non_production_environment_name" {
type = "string"
default = "non-production"
}
resource "aws_iam_group" "production_access" {
name = "access-${var.production_environment_name}"
}
resource "aws_iam_group" "non_production_access" {
name = "access-${var.non_production_environment_name}"
}
output "access_production" {
value = "aws_iam_group.production_access.arn"
}
output "access_non_production" {
value = "aws_iam_group.non_production_access.arn"
}
# Variant with dynamic set of environments (v0.11 syntax)
variable "environments" {
type = "list"
default = ["production", "non_production"]
}
resource "aws_iam_group" "access" {
count = "${length(var.environments)}"
name = "access-${var.environments[count.index]}"
}
output "access" {
value = "${aws_iam_group.access.*.arn}"
}
The key here is that the input variable and the output value must have the same form, so that we can make all of the necessary references between the objects. In the second example, the environment names are provided as a list, and the group ARNs are also provided as a list such that the indices correspond between the two.
You can also use a variant of the output "access" expression to combine the two with zipmap and get a map keyed by the environment names, which will probably be more convenient for the caller to use:
output "access" {
value = "${zipmap(var.environments, aws_iam_group.access.*.arn)}"
}
The new features in Terraform 0.12 allow tidying this up a bit. Here's an idiomatic Terraform 0.12 equivalent of the version that produces a map as a result:
# Variant with dynamic set of environments (v0.12 syntax)
variable "environments" {
type = set(string)
default = ["production", "non_production"]
}
resource "aws_iam_group" "access" {
for_each = var.environments
name = "access-${each.key}"
}
output "access" {
value = { for env, group in aws_iam_group.access : env => group.arn }
}
As well as having some slightly different syntax patterns, this 0.12 example has an additional practical advantage: Terraform will track those IAM groups with addresses like aws_iam_group.access["production"] and aws_iam_group.access["non_production"], so the positions of the environment names in the var.environments list are not important and it's possible to add and remove environments without potentially disturbing the groups from other environments due to the list element renumbering.
It achieves that by using resource for_each, which makes aws_iam_group.access appear as a map of objects where the environment names are keys, whereas count makes it a list of objects.

Select where tag end in a or b in Terraform data lookup

I have 3 subnets. They are named:
test-subnet-az-a test-subnet-az-b test-subnet-az-c
I have a datasource like so:
data "aws_subnet_ids" "test" {
vpc_id = "${module.vpc.id}"
tags = {
Name = "test-subnet-az-*"
}
}
This will return a list including all 3 subnets.
How do I return just the first 2, or those ending in a or b?
Terraform data sources are generally constrained by the capabilities of whatever underlying system they are querying, so the filtering supported by aws_subnet_ids is the same filtering supported by the underlying API, and so reviewing that API (EC2's DescribeSubnets) may show some variants you could try.
With that said, if you can use the data source in a way that is close enough to reduce the resultset down to a manageable size (which you seem to have achieved here) then you can filter the rest of the way using a for expression within the Terraform language itself:
data "aws_subnet_ids" "too_many" {
vpc_id = "${module.vpc.id}"
tags = {
Name = "test-subnet-az-*"
}
}
locals {
want_suffixes = toset(["a", "b"])
subnet_ids = toset([
for s in data.aws_subnet_ids.too_many.ids : s
if contains(local.want_suffixes, substr(s, length(s)-1, 1))
])
}
You can place any condition expression you like after if in that for expression to apply additional filters to the result, and then use local.subnet_ids elsewhere in the configuration to access that reduced set.
I used toset here to preserve the fact that aws_subnet_ids returns a set of strings value rather than a list of strings, but that's not particularly important unless you intend to use the result with a Terraform feature that requires a set, such as the for_each argument within resource and data blocks (which is not yet released as I write this, but should be released soon.)

How to create an RDS instance from the most recent snapshot or from scratch

In terraform, is there a way to conditionally create an RDS instance from the most recent snapshot of a given database or to create an empty database depending on the value of a parameter?
I tried something like that:
variable "db_snapshot_source" {
default = ""
}
data "aws_db_snapshot" "last_snap" {
count = "${var.db_snapshot_source == "" ? 0 : 1}"
most_recent = true
db_instance_identifier = "${var.db_snapshot_source}"
}
resource "aws_db_instance" "db" {
[...]
snapshot_identifier = "${var.db_snapshot_source == "" ? "" : data.aws_db_snapshot.last_snap.db_snapshot_identifier}"
}
Unfortunately, it does not work because TF seems to dereference data.aws_db_snapshot.last_snap even if the ternary is false. I get the following error message: * aws_db_instance.db: Resource 'data.aws_db_snapshot.last_snap' not found for variable 'data.aws_db_snapshot.last_snap.db_snapshot_identifier'.
How can I achieve a such behaviour? The only option I see is to declare two aws_db_instance resources each with opposed count which is horrifying.
By defining a count you are saying the result of the data resource will be a list even if it is a zero value.
resource "aws_db_instance" "db" {
[...]
snapshot_identifier = "${
var.db_snapshot_source == "" ? "" :
element(
concat(data.aws_db_snapshot.last_snap.*.db_snapshot_identifier, list("")), 0)
}"
}
The concat is required if you expect the list to be empty. Otherwise you get an error
element: element() may not be used with an empty list...
Github issue describing the concat behaviour
The documentation reads as though specifying snapshot_identifier is what triggers using a snapshot or not, so passing in an empty string is not enough to avoid starting from a snapshot. In that case, you would need two aws_rds_instance resources, and then have ternary expressions for count on each resource to decide which one to create. As you mentioned, this is horrifying, but it might work ok.
Another way to think about it is if you had a blank snapshot in your inventory to start from. Then it's just a ternary operator away from deciding to use the custom snapshot or this blank snapshot. I don't know that you can create a blank snapshot in Terraform though, it's creation might be out of band.