From the documentation, AWS::Athena::NamedQuery, it is unclear how to attach Athena to an S3 bucket specified in the same stack.
If I had to guess from the example, I would imagine that you can write a template like,
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
... other params ...
AthenaNamedQuery:
Type: AWS::Athena::NamedQuery
Properties:
Database: "db_name"
Name: "MostExpensiveWorkflow"
QueryString: >
CREATE EXTERNAL TABLE db_name.test_table
(...) LOCATION s3://.../path/to/folder/
Would a template like the above work? Upon stack creation, will the table db_name.test_table be available to run queries on?
Turns out the way you connect the S3 and Athena is to make a Glue table! How silly of me!! Of course Glue is how you connect things!
Sarcasm aside, this is a template that worked for me when using AWS::Glue::Table and AWS::Glue::Database,
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
MyGlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: my-glue-database
Description: "Glue beats tape"
CatalogId: !Ref AWS::AccountId
MyGlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref MyGlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: my-glue-table
Parameters: { "classification" : "csv" }
StorageDescriptor:
Location:
Fn::Sub: "s3://${MyS3Bucket}/"
InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
SerdeInfo:
Parameters: { "separatorChar" : "," }
SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
StoredAsSubDirectories: false
Columns:
- Name: column0
Type: string
- Name: column1
Type: string
After this, the database and table were in the AWS Athena Console!
Related
I have a lot of resources type AWS::Glue::Table in my aws templates. And I do not wont to copy-paste snippet of code from template to template. So idea is to create a reusable nested stack that accepts the params. I did it but one problem is still remaining. I do not know how I can pass columns via params to this stack [{Type: string, Name: type}, {Type: string, Name: timeLogged}] - it is an array of objects. But params accepts an only string type.
I tried to do something like this:
!Split [ "," , "{Type: string, Name: type}, {Type: string, Name: timeLogged}"] - but its did not helped
AWSTemplateFormatVersion: 2010-09-09
Description: The AWS CloudFormation template for creating a Glue table
Parameters:
DestinationBucketName:
Type: String
Description: Destination Regional Bucket Name
DestinationBucketPrefix:
Type: String
Description: Destination Regional Bucket Prefix
DatabaseName:
Type: String
Description: Database for Kinesis Analytics
TableName:
Type: String
Description: Table for Kinesis Analytics
InputFormat:
Type: String
Description: Input format for data
OutputFormat:
Type: String
Description: Output format for data
SerializationLibrary:
Type: String
Description: Serialization library for converting data
Resources:
LogsCollectionTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref DatabaseName
CatalogId: !Ref AWS::AccountId
TableInput:
Name: !Ref TableName
Description: Table for storing data
TableType: EXTERNAL_TABLE
StorageDescriptor:
Columns: [{Type: string, Name: type}, {Type: string, Name: timeLogged}]
Location: !Sub s3://${DestinationBucketName}/${DestinationBucketPrefix}
InputFormat: !Ref InputFormat
OutputFormat: !Ref OutputFormat
SerdeInfo:
SerializationLibrary: !Ref SerializationLibrary
Short answer: You currently can not. You would need to pass every parameter manually.
Source
I am trying to create a database on glue using cloud formation but it fails with the below error. Am I missing something?
Property validation failure: [The property {/DatabaseInput} is required, The property {/CatalogId} is required]
This is how my template code block looks like
GlueDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput: !Ref TeamName
According to the docs the DatabaseInput should have the following structure:
GlueDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Description: String
LocationUri: String
Name: String
Parameters: Json
Thus the question is, what TeamName is in your tempalte?
I have been searching for an example of how to set up Cloudformation for a glue workflow which includes triggers, jobs, and crawlers, but I haven't been able to find much information on it.
This is the only piece of information I am able to find from AWS
{
"Type" : "AWS::Glue::Workflow",
"Properties" : {
"DefaultRunProperties" : Json,
"Description" : String,
"Name" : String,
"Tags" : Json
}
}
Here's an example of a workflow with one crawler and a job to be run after the crawler finishes.
It is defined through tagging the triggers with the WorkflowName.
I believe there can be only one SCHEDULED or ON_DEMAND trigger to start the workflow. All the other triggers in the workflow need to be CONDITIONAL on the jobs / crawlers. That's probably how CloudFormation knows how to build the DAG.
Also see how the workflow parameters are defined as a json in the DefaultRunProperties.
---
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
BaseBucket:
Description: Bucket used by my workflow jobs
Type: String
Resources:
MyWorkflow:
Type: AWS::Glue::Workflow
Properties:
DefaultRunProperties:
{
"workflowParameter1": "Foo",
"workflowParameter2": "Bar",
"bucket": { "Fn::Sub": "${BaseBucket}" }
}
Description: Workflow for orchestrating my jobs
Name: MyWorkflowName
WorkflowCrawler:
Type: AWS::Glue::Crawler
Properties:
Name: MyCrawler
Role: MyCrawlerRole
Description: A crawler to run as the first step in the workflow
DatabaseName: MyDatabase
Targets:
S3Targets:
- Path: !Sub "s3://${BaseBucket}/"
WorkflowJob:
Type: AWS::Glue::Job
Properties:
Description: Glue job to run after the crawler
Name: MyWorkflowJob
Role: MyJobRole
Command:
Name: pythonshell
PythonVersion: 3
ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"
WorkflowStartTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: StartTrigger
Type: ON_DEMAND
Description: Trigger for starting the workflow
Actions:
- CrawlerName: !Ref WorkflowCrawler
WorkflowName: !Ref MyWorkflow
WorkflowJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: CrawlerSuccessfulTrigger
Type: CONDITIONAL
StartOnCreation: True
Description: Trigger to start the glue job
Actions:
- JobName: !Ref WorkflowJob
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref WorkflowCrawler
CrawlState: SUCCEEDED
WorkflowName: !Ref MyWorkflow
Here is an example of a Glue workflow using triggers, crawlers and a job to convert JSON to Parquet:
JSONtoParquetWorkflow:
Type: AWS::Glue::Workflow
Properties:
Name: json-to-parquet-workflow
Description: Workflow for orchestrating JSON to Parquet conversion
RawJSONCrawlerTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: raw-json-crawler-trigger
Description: Start crawler for raw JSON data
Type: ON_DEMAND
Actions:
- CrawlerName: !Ref RawJSONCrawler
JSONToParquetETLJobTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: json-to-parquet-etl-trigger
Description: Start JSON to Parquet ETL job
Type: CONDITIONAL
StartOnCreation: True
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref RawJSONCrawler
CrawlState: SUCCEEDED
Actions:
- JobName: !Ref JSONToParquetETLJob
RawParquetCrawlerTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: raw-parquet-crawler-trigger
Description: Start crawler for raw Parquet data
Type: CONDITIONAL
StartOnCreation: True
Predicate:
Conditions:
- LogicalOperator: EQUALS
JobName: !Ref JSONToParquetETLJob
State: SUCCEEDED
Actions:
- CrawlerName: !Ref RawParquetCrawler
There is no simple example to find, therefore I created an example AWS Glue Workflow: Getting started which is using AWS Cloudformation template. This example is very easy but explained with diagrams.
I have the following Cloudformation template I am trying to deploy via SAM. This template correctly creates the DynamoDB table, an API Key, a Lambda function and the API Gateway, but I cannot figure out what I need to specify in the template to associate the API KEY with the API Gateway.
I have found plenty of snippets showing partial examples, but I am struggling to piece it all together.
Thank you in advance,
Denny
AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Parameters:
TableName:
Type: String
Default: 'influencetabletest'
Description: (Required) The name of the new DynamoDB table Minimum 3 characters
MinLength: 3
MaxLength: 50
AllowedPattern: ^[A-Za-z-]+$
ConstraintDescription: 'Required parameter. Must be characters only. No numbers allowed.'
CorsOrigin:
Type: String
Default: '*'
Description: (Optional) Cross-origin resource sharing (CORS) Origin. You can specify a single origin, all "*" or leave empty and no CORS will be applied.
MaxLength: 250
Conditions:
IsCorsDefined: !Not [!Equals [!Ref CorsOrigin, '']]
Resources:
ApiKey:
Type: AWS::ApiGateway::ApiKey
DependsOn:
- ApiGetter
Properties:
Name: "TestApiKey"
Description: "CloudFormation API Key V1"
Enabled: "true"
ApiGetter:
Type: AWS::Serverless::Api
Properties:
StageName: prd
DefinitionBody:
swagger: 2.0
info:
title:
Ref: AWS::StackName
paths:
/getdynamicprice:
post:
responses: {}
x-amazon-apigateway-integration:
httpMethod: POST
type: aws_proxy
uri:
Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaGetter.Arn}/invocations
LambdaGetter:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./index.js
Handler: index.handler
Runtime: nodejs8.10
Environment:
Variables:
TABLE_NAME: !Ref TableName
IS_CORS: IsCorsDefined
CORS_ORIGIN: !Ref CorsOrigin
PRIMARY_KEY: !Sub ${TableName}Id
Policies:
- DynamoDBCrudPolicy:
TableName: !Ref TableName
Events:
Api:
Type: Api
Properties:
Path: /getdynamicprice
Method: POST
RestApiId: !Ref ApiGetter
DynamoDBTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: !Ref TableName
AttributeDefinitions:
-
AttributeName: !Sub "${TableName}Id"
AttributeType: "S"
KeySchema:
-
AttributeName: !Sub "${TableName}Id"
KeyType: "HASH"
ProvisionedThroughput:
ReadCapacityUnits: 1
WriteCapacityUnits: 1
StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES
Outputs:
ApiKeyID:
Value: !Ref ApiKey
ApiUrl:
Value: !Sub https://${ApiGetter}.execute-api.${AWS::Region}.amazonaws.com/prod/getdynamicprice
Description: The URL of the API Gateway you invoke to get your dynamic pricing result.
DynamoDBTableArn:
Value: !GetAtt DynamoDBTable.Arn
Description: The ARN of your DynamoDB Table
DynamoDBTableStreamArn:
Value: !GetAtt DynamoDBTable.StreamArn
Description: The ARN of your DynamoDB Table Stream
Edit (04/22/2020): there now seems to do all this using AWS SAM. Please see answer below
Here's a sample template where I have connected my API to a API key. But that's only been possible because I am using usage plans. I believe that is the primary purpose of an API key. API gateway usage plan
ApiKey:
Type: AWS::ApiGateway::ApiKey
Properties:
Name: !Join ["", [{"Ref": "AWS::StackName"}, "-apikey"]]
Description: "CloudFormation API Key V1"
Enabled: true
GenerateDistinctId: false
ApiUsagePlan:
Type: "AWS::ApiGateway::UsagePlan"
Properties:
ApiStages:
- ApiId: !Ref <API resource name>
Stage: !Ref <stage resource name>
Description: !Join [" ", [{"Ref": "AWS::StackName"}, "usage plan"]]
Quota:
Limit: 2000
Period: MONTH
Throttle:
BurstLimit: 10
RateLimit: 10
UsagePlanName: !Join ["", [{"Ref": "AWS::StackName"}, "-usage-plan"]]
ApiUsagePlanKey:
Type: "AWS::ApiGateway::UsagePlanKey"
Properties:
KeyId: !Ref <API key>
KeyType: API_KEY
UsagePlanId: !Ref ApiUsagePlan
There does not seem to be a way to do this without a usage plan.
I did try the suggestion from ASR but ended up with a simpler approach.
The AWS SAM (Serverless Application Model) contains prepackaged handling that doesn't necessitate the use of resources of the ApiGateway type.
To create an API Gateway with a stage that requires an authorization token in the header the following simplified code should do it for you :
Resources:
ApiGatewayEndpoint:
Type: AWS::Serverless::Api
Properties:
StageName: Prod
Auth:
ApiKeyRequired: true
UsagePlan:
CreateUsagePlan: PER_API
UsagePlanName: GatewayAuthorization [any name you see fit]
LambdaFunction:
Type: AWS::Serverless::Function
Properties:
Handler: lambda.handler
Runtime: python3.7
Timeout: 30
CodeUri: .
Events:
PostEvent:
Type: Api
Properties:
Path: /content
Method: POST
RequestParameters:
- method.request.header.Authorization:
Required: true
Caching: true
RestApiId:
Ref: ApiGatewayEndpoint [The logical name of your gateway endpoint above]
The elements:
Auth:
ApiKeyRequired: true
UsagePlan:
CreateUsagePlan: PER_API
is what does the trick.
Cloudformation handles the plumbing for you, ie. the Api Key, UsagePlan and UsagePlanKey is automatically created and binded.
Although the docs are definitely not best in class they do provide some additional information: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-specification-resources-and-properties.html
I'm having a strange behavior with cloudformation template. This my template, where I create a bucket and want to notification configuration depending on a condition :
AWSTemplateFormatVersion: '2010-09-09'
Description: "Setup Artifacts Bucket"
Parameters:
BucketName:
Description: Name of the pipeline setup arctifact bucket
Type: String
Default: "s3-pipeline-setup"
NotificationCondition:
Description: Conditionally add Notification configuration to the artifact bucket
Type: String
Default: false
Conditions:
AddNotificationConfiguration: !Equals [ !Ref NotificationCondition, true ]
Resources:
ArtifactBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref BucketName
Fn::If:
- AddNotificationConfiguration
-
NotificationConfiguration:
LambdaConfigurations:
-
Function: "arn:aws:lambda:eu-west-1:341292222222227:function:lambda-ops-trigger-pipeline-setup"
Event: "s3:ObjectCreated:*"
Filter:
S3Key:
Rules:
-
Name: prefix
Value: "appstackcodes/"
-
Name: suffix
Value: "txt"
- !Ref AWS::NoValue
When I try a deploy it fails with this error :
00:28:10
UTC+0200 CREATE_FAILED AWS::S3::Bucket ArtifactBucket Encountered
unsupported property Fn::If
I don't really understand the matter.. Can someone try and let me know the mistake there please?
Thanks
Unfortunately you can not do what you intended in cloudformation.
The Fn::If can basically just be used as a ternary expression. E.g.
key: Fn::If: [condition_name, value_if_true, value_if_false]
It can't be used as logic flow like you would in a programming language. There are ways around it. You actually already seemed to have discovered the AWS::NoValue, so it's just a matter of moving the NotificationConfiguration assignment to outside the if.
Resources:
ArtifactBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref BucketName
NotificationConfiguration:
Fn::If:
- AddNotificationConfiguration
- LambdaConfigurations:
-
Function: "arn:aws:lambda:eu-west-1:341294322147:function:lambda-itops-trigger-pipeline-setup"
Event: "s3:ObjectCreated:*"
Filter:
S3Key:
Rules:
-
Name: prefix
Value: "appstackcodes/"
-
Name: suffix
Value: "txt"
- !Ref AWS::NoValue
Effectively you are always assigning something to NotificationConfiguration, but sometimes it's the magic AWS::NoValue. This works in the majority of cases, although there are times when this just isn't sufficient and more creativity is required!