AWS cloud formation glue table reusable template - amazon-web-services

I have a lot of resources type AWS::Glue::Table in my aws templates. And I do not wont to copy-paste snippet of code from template to template. So idea is to create a reusable nested stack that accepts the params. I did it but one problem is still remaining. I do not know how I can pass columns via params to this stack [{Type: string, Name: type}, {Type: string, Name: timeLogged}] - it is an array of objects. But params accepts an only string type.
I tried to do something like this:
!Split [ "," , "{Type: string, Name: type}, {Type: string, Name: timeLogged}"] - but its did not helped
AWSTemplateFormatVersion: 2010-09-09
Description: The AWS CloudFormation template for creating a Glue table
Parameters:
DestinationBucketName:
Type: String
Description: Destination Regional Bucket Name
DestinationBucketPrefix:
Type: String
Description: Destination Regional Bucket Prefix
DatabaseName:
Type: String
Description: Database for Kinesis Analytics
TableName:
Type: String
Description: Table for Kinesis Analytics
InputFormat:
Type: String
Description: Input format for data
OutputFormat:
Type: String
Description: Output format for data
SerializationLibrary:
Type: String
Description: Serialization library for converting data
Resources:
LogsCollectionTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref DatabaseName
CatalogId: !Ref AWS::AccountId
TableInput:
Name: !Ref TableName
Description: Table for storing data
TableType: EXTERNAL_TABLE
StorageDescriptor:
Columns: [{Type: string, Name: type}, {Type: string, Name: timeLogged}]
Location: !Sub s3://${DestinationBucketName}/${DestinationBucketPrefix}
InputFormat: !Ref InputFormat
OutputFormat: !Ref OutputFormat
SerdeInfo:
SerializationLibrary: !Ref SerializationLibrary

Short answer: You currently can not. You would need to pass every parameter manually.
Source

Related

Add partition projection to AWS Athena table using Cloudformation

I have an Athena table defined with a template specified like so in cloudformation:
Cloudformation Create
EventsTable:
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: !Ref DatabaseName
TableInput:
Description: "My Table"
Name: !Ref TableName
TableType: EXTERNAL_TABLE
StorageDescriptor:
Compressed: True
Columns:
- Name: account_id
Type: string
Comment: "Account Id of the account making the request"
...
InputFormat: org.apache.hadoop.mapred.TextInputFormat
SerdeInfo:
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
Location: !Sub "s3://${EventsBucketName}/events/"
This works well and deploys. I also found out I can have partition projections created as per this doc and this doc
And can make that work with a direct table creation, roughly:
SQL Create
CREATE EXTERNAL TABLE `performance_data.events`
(
`account_id` string,
...
)
PARTITIONED BY (
`day` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
's3://my-bucket/events/'
TBLPROPERTIES (
'has_encrypted_data' = 'false',
'projection.enabled' = 'true',
'projection.day.type' = 'date',
'projection.day.format' = 'yyyy/MM/dd',
'projection.day.range' = '2020/01/01,NOW',
'projection.day.interval' = '1',
'projection.day.interval.unit' = 'DAYS',
'storage.location.template' = 's3://my-bucket/events/${day}/'
)
But I can't find the docs to convert into the cloud formation structure. So my question is, how can I achieve the partition projection shown in the SQL code in cloudformation?
I now have a working solution. The missing piece was really a missing parameter, here is the solution:
MyTableResource:
Type: AWS::Glue::Table
Properties:
CatalogId: MyAccountId
DatabaseName: MyDatabase
TableInput:
Description: "My Table"
Name: mytable
TableType: EXTERNAL_TABLE
PartitionKeys:
- Name: day
Type: string
Comment: Day partition
Parameters:
"projection.enabled": "true"
"projection.day.type": "date"
"projection.day.format": "yyyy/MM/dd"
"projection.day.range": "2020/01/01,NOW"
"projection.day.interval": "1"
"projection.day.interval.unit": "DAYS"
"storage.location.template": "s3://my-bucket/events/${day}/"
StorageDescriptor:
Compressed: True
Columns:
...
InputFormat: org.apache.hadoop.mapred.TextInputFormat
SerdeInfo:
Parameters:
serialization.format: '1'
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
Location: "s3://my-bucket/events/"
The key addition was:
serialization.format: '1'
This now completely works and one can do a query that using the partition as:
select * from mytable where day > '2022/05/03'
Referring to the CloudFormation reference for the Glue Table TableInput, you can specify PartitionKeys and Parameters. This is the equivalent of PARTITIONED BY and TBLPROPERTIES in the query.
EDIT
As an example, you can refer to this article. The sample below shows how to define the PartitionKeys and how to define a JSON for the Parameters. In your case, you just have to add the projection keys (such as projection.enabled) and values (true).
# Create an Amazon Glue table
CFNTableFlights:
# Creating the table waits for the database to be created
DependsOn: CFNDatabaseFlights
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: !Ref CFNDatabaseName
TableInput:
Name: !Ref CFNTableName1
Description: Define the first few columns of the flights table
TableType: EXTERNAL_TABLE
Parameters: {
"classification": "csv"
}
# ViewExpandedText: String
PartitionKeys:
# Data is partitioned by month
- Name: mon
Type: bigint
StorageDescriptor:
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Columns:
- Name: year
Type: bigint
- Name: quarter
Type: bigint
- Name: month
Type: bigint
- Name: day_of_month
Type: bigint
InputFormat: org.apache.hadoop.mapred.TextInputFormat
Location: s3://crawler-public-us-east-1/flight/2016/csv/
SerdeInfo:
Parameters:
field.delim: ","
SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Reference Secrets Manager Parameters to Secret String

Is there any way to reference parameters in SecretString field in Secrets Manager via CloudFormation?
The way I made the script, the !Ref parameter is a text and not a reference to the parameter.
AWSTemplateFormatVersion: 2010-09-09
Parameters:
Name:
Type: String
myuserparameter:
Type: String
mypasswordparameter:
Type: String
Resources:
SecretsManager:
Type: AWS::SecretsManager::Secret
Properties:
Name: !Ref Name
SecretString: '{"username":"!Ref myuserparameter,"password":"Ref mypasswordparameter"}'
this will work:
AWSTemplateFormatVersion: 2010-09-09
Parameters:
Name:
Type: String
myuserparameter:
Type: String
mypasswordparameter:
Type: String
Resources:
SecretsManager:
Type: AWS::SecretsManager::Secret
Properties:
Name: !Ref Name
SecretString: !Sub '{"username": "${myuserparameter}","password": "${mypasswordparameter}"}'

Some given parameters are not resolved when deploying a stack "not found"

I want to create and deploy a template that itself deploys a product from the AWS service catalog. Here is my template:
Parameters:
ProductId:
Type: String
ProvisioningArtifactName:
Type: String
Description:
Type: String
Region:
Type: CommaDelimitedList
VpcSize:
Type: String
BastionHostKeyName:
Type: String
ProvisioningArtifactName:
Type: String
Resources:
VPCAndMore:
Type: AWS::ServiceCatalog::CloudFormationProvisionedProduct
Properties:
ProductId: ProductId
ProvisioningArtifactName: ProvisioningArtifactName
ProvisioningParameters:
- Key: Description
Value: Description
- Key: AvailabilityZones
Value: Region
- Key: VpcSize
Value: VpcSize
- Key: BastionHostKeyName
Value: BastionHostKeyName
When I try to deploy it manually I enter all parameter values. They are definitely correct and from the correct type. But once I deploy it I get an error like this:
Product ProductId not found. (Service: ServiceCatalog, Status Code: 400, Request ID: 35f27a2a-1317-48d0-815e-16ebe949d039, Extended Request ID: null)
For some reason the ProductId parameter is not resolved it seems like.
What am I missing? Or is CF not supporting parameter resolving outside of ProvisioningParameters?
For Intrinsic function Ref need to reference the values defined like below:
Parameters:
ProductId:
Type: String
ProvisioningArtifactName:
Type: String
Description:
Type: String
Region:
Type: CommaDelimitedList
VpcSize:
Type: String
BastionHostKeyName:
Type: String
ProvisioningArtifactName:
Type: String
Resources:
VPCAndMore:
Type: AWS::ServiceCatalog::CloudFormationProvisionedProduct
Properties:
ProductId: !Ref ProductId
ProvisioningArtifactName: !Ref ProvisioningArtifactName
ProvisioningParameters:
- Key: Description
Value: !Ref Description
- Key: AvailabilityZones
Value: !Ref Region
- Key: VpcSize
Value: !Ref VpcSize
- Key: BastionHostKeyName
Value: !Ref BastionHostKeyName
The problem is that you're only inserting the parameters name without referencing it.
You need to use the intrinsic function !Ref. Like this:
Parameters:
ProductId:
Type: String
ProvisioningArtifactName:
Type: String
Description:
Type: String
Region:
Type: CommaDelimitedList
VpcSize:
Type: String
BastionHostKeyName:
Type: String
ProvisioningArtifactName:
Type: String
Resources:
VPCAndMore:
Type: AWS::ServiceCatalog::CloudFormationProvisionedProduct
Properties:
ProductId: !Ref ProductId
ProvisioningArtifactName: !Ref ProvisioningArtifactName
ProvisioningParameters:
- Key: Description
Value: !Ref Description
- Key: AvailabilityZones
Value: !Ref Region
- Key: VpcSize
Value: !Ref VpcSize
- Key: BastionHostKeyName
Value: !Ref BastionHostKeyName

AWS glue cloud formation db creation error

I am trying to create a database on glue using cloud formation but it fails with the below error. Am I missing something?
Property validation failure: [The property {/DatabaseInput} is required, The property {/CatalogId} is required]
This is how my template code block looks like
GlueDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput: !Ref TeamName
According to the docs the DatabaseInput should have the following structure:
GlueDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Description: String
LocationUri: String
Name: String
Parameters: Json
Thus the question is, what TeamName is in your tempalte?

Connecting Athena and S3 in same Cloudformation Stack

From the documentation, AWS::Athena::NamedQuery, it is unclear how to attach Athena to an S3 bucket specified in the same stack.
If I had to guess from the example, I would imagine that you can write a template like,
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
... other params ...
AthenaNamedQuery:
Type: AWS::Athena::NamedQuery
Properties:
Database: "db_name"
Name: "MostExpensiveWorkflow"
QueryString: >
CREATE EXTERNAL TABLE db_name.test_table
(...) LOCATION s3://.../path/to/folder/
Would a template like the above work? Upon stack creation, will the table db_name.test_table be available to run queries on?
Turns out the way you connect the S3 and Athena is to make a Glue table! How silly of me!! Of course Glue is how you connect things!
Sarcasm aside, this is a template that worked for me when using AWS::Glue::Table and AWS::Glue::Database,
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
MyGlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: my-glue-database
Description: "Glue beats tape"
CatalogId: !Ref AWS::AccountId
MyGlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref MyGlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: my-glue-table
Parameters: { "classification" : "csv" }
StorageDescriptor:
Location:
Fn::Sub: "s3://${MyS3Bucket}/"
InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
SerdeInfo:
Parameters: { "separatorChar" : "," }
SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
StoredAsSubDirectories: false
Columns:
- Name: column0
Type: string
- Name: column1
Type: string
After this, the database and table were in the AWS Athena Console!