Using Athena Terraform Scripts - amazon-web-services

Amazon Athena reads data from input Amazon S3 buckets using the IAM credentials of the user who submitted the query; query results are stored in a separate S3 bucket.
Here is the script in Hashicorp site https://www.terraform.io/docs/providers/aws/r/athena_database.html
resource "aws_s3_bucket" "hoge" {
bucket = "hoge"
}
resource "aws_athena_database" "hoge" {
name = "database_name"
bucket = "${aws_s3_bucket.hoge.bucket}"
}
Where it says
bucket - (Required) Name of s3 bucket to save the results of the query execution.
How can I specify the input S3 bucket in the terraform script?

You would use the storage_descriptor argument in the aws_glue_catalog_table resource:
https://www.terraform.io/docs/providers/aws/r/glue_catalog_table.html#parquet-table-for-athena
Here is an example of creating a table using CSV file(s):
resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
name = "your_table_name"
database_name = "${aws_athena_database.your_athena_database.name}"
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
}
storage_descriptor {
location = "s3://<your-s3-bucket>/your/file/location/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.mapred.TextInputFormat"
ser_de_info {
name = "my-serde"
serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
parameters = {
"field.delim" = ","
"skip.header.line.count" = "1"
}
}
columns {
name = "column1"
type = "string"
}
columns {
name = "column2"
type = "string"
}
}
}

The input S3 bucket is specified in each table you create in the database, as such, there's no global definition for it.
As of today, the AWS API doesn't have much provision for Athena management, as such, neither does the aws CLI command, and nor does Terraform. There's no 'proper' way to create a table via these means.
In theory, you could create a named query to create your table, and then execute that query (for which there is API functionality, but not yet Terraform). It seems a bit messy to me, but it would probably work if/when TF gets the StartQuery functionality. The asynchronous nature of Athena makes it tricky to know when that table has actually been created though, and so I can imagine TF won't fully support table creation directly.
TF code that covers the currently available functionality is here: https://github.com/terraform-providers/terraform-provider-aws/tree/master/aws
API doco for Athena functions is here: https://docs.aws.amazon.com/athena/latest/APIReference/API_Operations.html

Related

Terraform - Get a value from AWS app config params and pass to resource

How to get aws configuration parameters stored in json format on S3 in terraform scripts. I want to use those parameters in another resources.
I just want to externalise all the variable parameters in the script.
e.g: we have Data Source: aws_ssm_parameter to get AWS ssm parameters.
'''
data "aws_ssm_parameter" "foo" {
name = "foo"
}
'''
Similarly how can we get aws app configurations in terraform scripts.
From my understanding you need to read S3 objects' value's and use it in terraform.
Used data because it's external resource that we're referencing.
I would use like this:
data "aws_s3_object" "obj" {
bucket = "foo"
key = "foo.json"
}
output "s3_json_value" {
value = data.aws_s3_object.obj.body
}
To parse JSON you can use jsondecode
locals {
a_variable = jsondecode(data.aws_s3_object.obj.body)
}
output "Username" {
value = local.a_variable.name
}

Terraform update only a cloud function from a bunch

I have a Terraform project that allows to create multiple cloud functions.
I know that if I change the name of the google_storage_bucket_object related to the function itself, terraform will see the difference of the zip name and redeploy the cloud function.
My question is, there is a way to obtain the same behaviour, but only with the cloud functions that have been changed?
resource "google_storage_bucket_object" "zip_file" {
# Append file MD5 to force bucket to be recreated
name = "${local.filename}#${data.archive_file.source.output_md5}"
bucket = var.bucket.name
source = data.archive_file.source.output_path
}
# Create Java Cloud Function
resource "google_cloudfunctions_function" "java_function" {
name = var.function_name
runtime = var.runtime
available_memory_mb = var.memory
source_archive_bucket = var.bucket.name
source_archive_object = google_storage_bucket_object.zip_file.name
timeout = 120
entry_point = var.function_entry_point
event_trigger {
event_type = var.event_trigger.event_type
resource = var.event_trigger.resource
}
environment_variables = {
PROJECT_ID = var.env_project_id
SECRET_MAIL_PASSWORD = var.env_mail_password
}
timeouts {
create = "60m"
}
}
By appending MD5 every cloud functions will result in a different zip file name, so terraform will re-deploy every of them and I found that without the MD5, Terraform will not see any changes to deploy.
If I have changed some code only inside a function, how can I tell to Terraform to re-deploy only it (so for example to change only its zip file name)?
I hope my question is clear and I want to thank you everyone who tries to help me!

Terraform Data block, all buckets

I am trying to create an inventory list for all the buckets in a aws account, i amusing the terraform data source block in terraform to fetch the s3 buckets but can't figure out how to get all the buckets in my account, or which expression to use to get all the buckets, so i can do an inventory on all of them, please find my code below.
data "aws_s3_bucket" "select_bucket" {
bucket = "????"
}
resource "aws_s3_bucket" "inventory" {
bucket = "x-bucket"
}
resource "aws_s3_bucket_inventory" "inventory_list" {
for_each = toset([data.aws_s3_bucket.select_bucket.id])
bucket = each.key
name = "lifecycle_analysis_bucket"
included_object_versions = "All"
schedule {
frequency = "Daily"
}
destination {
bucket {
format = "CSV"
bucket_arn = aws_s3_bucket.inventory.arn
}
}
}
which expression to use to get all the buckets,
There is no such expression. You have to prepare the list of all you buckets beforhand, and then you can iterate over them in your code. The other option is to develop your own custom data source which would use AWS CLI or SDK to get the list of your buckets and return to TF for further processing.

AWS Macie & Terraform - Select all S3 buckets in account

I am enabling AWS Macie 2 using terraform and I am defining a default classification job as following:
resource "aws_macie2_account" "member" {}
resource "aws_macie2_classification_job" "member" {
job_type = "ONE_TIME"
name = "S3 PHI Discovery default"
s3_job_definition {
bucket_definitions {
account_id = var.account_id
buckets = ["S3 BUCKET NAME 1", "S3 BUCKET NAME 2"]
}
}
depends_on = [aws_macie2_account.member]
}
AWS Macie needs a list of S3 buckets to analyze. I am wondering if there is a way to select all buckets in an account, using a wildcard or some other method. Our production accounts contain hundreds of S3 buckets and hard-coding each value in the s3_job_definition is not feasible.
Any ideas?
The Terraform AWS provider does not support a data source for listing S3 buckets at this time, unfortunately. For things like this (data sources that Terraform doesn't support), the common approach is to use the AWS CLI through an external data source.
These are modules that I like to use for CLI/shell commands:
As a data source (re-runs each time)
As a resource (re-runs only on resource recreate or on a change to a trigger)
Using the data source version, it would look something like:
module "list_buckets" {
source = "Invicton-Labs/shell-data/external"
version = "0.1.6"
// Since the command is the same on both Unix and Windows, it's ok to just
// specify one and not use the `command_windows` input arg
command_unix = "aws s3api list-buckets --output json"
// You want Terraform to fail if it can't get the list of buckets for some reason
fail_on_error = true
// Specify your AWS credentials as environment variables
environment = {
AWS_PROFILE = "myprofilename"
// Alternatively, although not recommended:
// AWS_ACCESS_KEY_ID = "..."
// AWS_SECRET_ACCESS_KEY = "..."
}
}
output "buckets" {
// We specified JSON format for the output, so decode it to get a list
value = jsondecode(module.list_buckets.stdout).Buckets
}
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
buckets = [
{
"CreationDate" = "2021-07-15T18:10:20+00:00"
"Name" = "bucket-foo"
},
{
"CreationDate" = "2021-07-15T18:11:10+00:00"
"Name" = "bucket-bar"
},
]

Dumping Terraform output to a local file

I want to create a file (credentials.json) within a directory, say content using Terraform.
The contents will be the output of a private service account key.
I am using the following code to create the service account and get its key to data:
resource "google_service_account" "my-account" {
account_id = "${var.account_id}"
project = "${var.project_id}"
}
resource "google_service_account_key" "my-account" {
service_account_id = "${google_service_account.my-account.name}"
}
data "google_service_account_key" "my-account" {
name = "${google_service_account_key.cd.name}"
public_key_type = "TYPE_X509_PEM_FILE"
}
How can I then dump it to a local file?
My use case is that I want to create the credentials.json to enable periodic backups of jenkins to a google cloud storage bucket.
You can use the local_file resource to write data to disk in a Terraform run.
So you could do something like the following:
resource "google_service_account" "my-account" {
account_id = "${var.account_id}"
project = "${var.project_id}"
}
resource "google_service_account_key" "my-account" {
service_account_id = "${google_service_account.my-account.name}"
}
resource "local_file" "key" {
filename = "/path/to/key/output"
content = "${base64decode(google_service_account_key.my-account.private_key)}"
}
Note that you should never need a data source to look at the outputs of a resource you are creating in that same Terraform command. In this case you can ditch the google_service_account_key data source because you have the resource available to you.
The benefit of data sources is when you need to look up some generated value of a resource either not created by Terraform or in a different state file.
Your best bet would be to create output for your service account:
output "google_service_account_key" {
value = "${base64decode(data.google_service_account_key.my-account.private_key)}"
}
With the terraform output command you can then query specifically for the key, combined with jq (or another json parser) to find the correct output:
terraform output -json google_service_account_key | jq '.value[0]' > local_file.json