Creating AWS Athena View using Cloud Formation template - amazon-web-services

Is it possible to create an Athena view via cloudformation template. I can create the view using the Athena Dashboard but I want to do this programmatically using CF templates. Could not find any details in AWS docs so not sure if supported.
Thanks.

It is possible to create views with CloudFormation, it's just very, very, complicated. Athena views are stored in the Glue Data Catalog, like databases and tables are. In fact, Athena views are tables in Glue Data Catalog, just with slightly different contents.
See this answer for the full description how to create a view programmatically, and you'll get an idea for the complexity: Create AWS Athena view programmatically – it is possible to map that to CloudFormation, but I would not recommend it.
If you want to create databases and tables with CloudFormation, the resources are AWS::Glue::Database and AWS::Glue::Table.

In general, CloudFormation is used for deploying infrastructure in a repeatable manner. This doesn't apply much to data inside a database, which typically persists separately to other infrastructure.
For Amazon Athena, AWS CloudFormation only supports:
Data Catalog
Named Query
Workgroup
The closest to your requirements is Named Query, which (I think) could store a query that can create the View (eg CREATE VIEW...).
See: AWS::Athena::NamedQuery - AWS CloudFormation
Update: #Theo points out that AWS CloudFormation also has AWS Glue functions that include:
AWS::Glue::Table
This can apparently be used to create a view. See comments below.

I think for now the best way to create Athena view from CloudFormation template is to use Custom resource and Lambda. We have to supply methods for View creation and deletion. For example, using crhelper library Lambda could be defined:
from __future__ import print_function
from crhelper import CfnResource
import logging
import os
import boto3
logger = logging.getLogger(__name__)
helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', sleep_on_delete=120)
try:
client = boto3.client('athena')
ATHENA_WORKGROUP = os.environ['athena_workgroup']
DATABASE = os.environ['database']
QUERY_CREATE = os.environ['query_create']
QUERY_DROP = os.environ['query_drop']
except Exception as e:
helper.init_failure(e)
#helper.create
#helper.update
def create(event, context):
logger.info("View creation started")
try:
executionResponse = client.start_query_execution(
QueryString=QUERY_CREATE,
QueryExecutionContext={'Database': DATABASE},
WorkGroup='AudienceAthenaWorkgroup'
)
logger.info(executionResponse)
response = client.get_query_execution(QueryExecutionId=executionResponse['QueryExecutionId'])
logger.info(response)
if response['QueryExecution']['Status']['State'] == 'FAILED':
logger.error("Query failed")
raise ValueError("Query failed")
helper.Data['success'] = True
helper.Data['id'] = executionResponse['QueryExecutionId']
helper.Data['message'] = 'query is running'
except Exception as e:
print(f"An exception occurred: {e}")
if not helper.Data.get("success"):
raise ValueError("Creating custom resource failed.")
return
#helper.delete
def delete(event, context):
logger.info("View deletion started")
try:
executionResponse = client.start_query_execution(
QueryString=QUERY_DROP,
QueryExecutionContext={'Database': DATABASE},
WorkGroup='AudienceAthenaWorkgroup'
)
logger.info(executionResponse)
except Exception as e:
print("An exception occurred")
print(e)
#helper.poll_create
def poll_create(event, context):
logger.info("Pol creation")
response = client.get_query_execution(QueryExecutionId=event['CrHelperData']['id'])
logger.info(f"Poll response: {response}")
# There are 3 types of state of query
# if state is failed - we stop and fail creation
# if state is queued - we continue polling in 2 minutes
# if state is succeeded - we stop and succeed creation
if 'FAILED' == response['QueryExecution']['Status']['State']:
logger.error("Query failed")
raise ValueError("Query failed")
if 'SUCCEEDED' == response['QueryExecution']['Status']['State']:
logger.error("Query SUCCEEDED")
return True
if 'QUEUED' == response['QueryExecution']['Status']['State']:
logger.error("Query QUEUED")
return False
# Return a resource id or True to indicate that creation is complete. if True is returned an id
# will be generated
# Return false to indicate that creation is not complete and we need to poll again
return False
def handler(event, context):
helper(event, context)
The Athena queries for view creation/updation/deletion are passed as environmental parameters to Lambda.
In CloudFormation template we have to define the Lambda that invokes mentioned Python code and creates/updates/deletes Athena view. For example
AthenaCommonViewLambda:
Type: 'AWS::Lambda::Function'
DependsOn: [CreateAthenaViewLayer, CreateAthenaViewLambdaRole]
Properties:
Environment:
Variables:
athena_workgroup: !Ref AudienceAthenaWorkgroup
database:
Ref: DatabaseName
query_create: !Sub >-
CREATE OR REPLACE VIEW ${TableName}_view AS
SELECT field1, field2, ...
FROM ${DatabaseName}.${TableName}
query_drop: !Sub DROP VIEW IF EXISTS ${TableName}_common_view
Code:
S3Bucket: !Ref SourceS3Bucket
S3Key: createview.zip
FunctionName: !Sub '${AWS::StackName}_create_common_view'
Handler: createview.handler
MemorySize: 128
Role: !GetAtt CreateAthenaViewLambdaRole.Arn
Runtime: python3.8
Timeout: 60
Layers:
- !Ref CreateAthenaViewLayer
AthenaCommonView:
Type: 'Custom::AthenaCommonView'
Properties:
ServiceToken: !GetAtt AthenaCommonViewLambda.Arn

Related

How to create an Athena stack and consume Glue Data catalog?

I have to create an athena template in cloud formation, the task is to replicate the next Terraform script using CF:
resource "aws_athena_workgroup" "sample_athena_wg" {
name = "sample_athena_wg"
}
resource "aws_athena_database" "sample_athena_database" {
name = "sample_athena_database"
bucket = "sample_bucket_id"
}
resource "aws_athena_named_query" "test_query" {
name = "Test"
workgroup = aws_athena_workgroup.sample_athena_wg.id
database = aws_athena_database.sample_athena_database.name
query = "SELECT * FROM ${aws_athena_database.sample_athena_database.name} limit 10;"
}
The problem is that there is no such a resource in CF called "AWS::ATHENA::DATABASE" or something like that, and I don't really know what the terraform resource "aws_athena_database" is creating behind the scenes. When I deploy the Terraform script, it seems like this creates a glue database, but I do know what else this creates.
I found that Terraform creates an Athena Workgroup and GLue database behind the scenes when you try to create a aws_athena_database. I could replicate this resource in CF like this:
AthenaWorkgroup:
Type: AWS::Athena::WorkGroup
Properties:
Name: ....
State: ENABLEDr
WorkGroupConfiguration:
BytesScannedCutoffPerQuery: ...
EnforceWorkGroupConfiguration: ...
PublishCloudWatchMetricsEnabled: ...
ResultConfiguration:
OutputLocation: !Ref S3LocationPath
GlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: ...

How to read SSM Parameter dynamically from Lambda Environment variable

I am keeping the application endpoint in SSM parameter store and able to access from Lambda environment .
Resources:
M4IAcarsScheduler:
Type: AWS::Serverless::Function
Properties:
Handler: not.used.in.provided.runtime
Runtime: provided
CodeUri: target/function.zip
MemorySize: 512
Timeout: 900
FunctionName: Sample
Environment:
Variables:
SamplePath: !Ref sample1path
SampleId: !Ref sample1pathid
Parameters:
sample1path:
Type: AWS::SSM::Parameter::Value<String>
Description: Select existing security group for lambda function from Parameter Store
Default: /sample/path
sample1pathid:
Type: AWS::SSM::Parameter::Value<String>
Description: Select existing security group for lambda function from Parameter Store
Default: /sample/id
My issue is while I am updating the SSM parameter, the Lambda Env. is not update dynamically, and every time I need to restart.
Is there any way I can handle it dynamically, meaning that when it changes in SSM parameter Store, it'll reflect without restart of Lambda?
By using SSM parameters in a CloudFormation stack, the parameters get resolved when the CloudFormation stack is deployed. If the value in SSM subsequently changes, there is nothing to update the lambda, so the lambda will still have the value that was pulled from SSM at the moment the CloudFormation stack deployed. The lambda will not even know that the parameter came from SSM; rather, it will only know that there there is a static environment variable configured.
Instead, to use SSM Parameters in your lambda you should change your lambda code so that it fetches the parameter from inside the code. This AWS blog shows a Python lambda example of how to fetch the parameters from the lambda code (when the lambda runs):
import os, traceback, json, configparser, boto3
from aws_xray_sdk.core import patch_all
patch_all()
# Initialize boto3 client at global scope for connection reuse
client = boto3.client('ssm')
env = os.environ['ENV']
app_config_path = os.environ['APP_CONFIG_PATH']
full_config_path = '/' + env + '/' + app_config_path
# Initialize app at global scope for reuse across invocations
app = None
class MyApp:
def __init__(self, config):
"""
Construct new MyApp with configuration
:param config: application configuration
"""
self.config = config
def get_config(self):
return self.config
def load_config(ssm_parameter_path):
"""
Load configparser from config stored in SSM Parameter Store
:param ssm_parameter_path: Path to app config in SSM Parameter Store
:return: ConfigParser holding loaded config
"""
configuration = configparser.ConfigParser()
try:
# Get all parameters for this app
param_details = client.get_parameters_by_path(
Path=ssm_parameter_path,
Recursive=False,
WithDecryption=True
)
# Loop through the returned parameters and populate the ConfigParser
if 'Parameters' in param_details and len(param_details.get('Parameters')) > 0:
for param in param_details.get('Parameters'):
param_path_array = param.get('Name').split("/")
section_position = len(param_path_array) - 1
section_name = param_path_array[section_position]
config_values = json.loads(param.get('Value'))
config_dict = {section_name: config_values}
print("Found configuration: " + str(config_dict))
configuration.read_dict(config_dict)
except:
print("Encountered an error loading config from SSM.")
traceback.print_exc()
finally:
return configuration
def lambda_handler(event, context):
global app
# Initialize app if it doesn't yet exist
if app is None:
print("Loading config and creating new MyApp...")
config = load_config(full_config_path)
app = MyApp(config)
return "MyApp config is " + str(app.get_config()._sections)
Here is a post with an example in Node, and similar examples exist for other languages too.
// parameter expected by SSM.getParameter
var parameter = {
"Name" : "/systems/"+event.Name+"/config"
};
responseFromSSM = await SSM.getParameter(parameter).promise();
console.log('SUCCESS');
console.log(responseFromSSM);
var value = responseFromSSM.Parameter.Value;

Set up S3 Bucket level Events using AWS CloudFormation

I am trying to get AWS CloudFormation to create a template that will allow me to attach an event to an existing S3 Bucket that will trigger a Lambda Function whenever a new file is put into a specific directory within the bucket. I am using the following YAML as a base for the CloudFormation template but cannot get it working.
---
AWSTemplateFormatVersion: '2010-09-09'
Resources:
SETRULE:
Type: AWS::S3::Bucket
Properties:
BucketName: bucket-name
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:Put
Filter:
S3Key:
Rules:
- Name: prefix
Value: directory/in/bucket
Function: arn:aws:lambda:us-east-1:XXXXXXXXXX:function:lambda-function-trigger
Input: '{ CONFIGS_INPUT }'
I have tried rewriting this template a number of different ways to no success.
Since you have mentioned that those buckets already exists, this is not going to work. You can use CloudFormation in this way but only to create a new bucket, not to modify existing bucket if that bucket was not created via that template in the first place.
If you don't want to recreate your infrastructure, it might be easier to just use some script that will subscribe lambda function to each of the buckets. As long as you have a list of buckets and the lambda function, you are ready to go.
Here is a script in Python3. Assuming that we have:
2 buckets called test-bucket-jkg2 and test-bucket-x1gf
lambda function with arn: arn:aws:lambda:us-east-1:605189564693:function:my_func
There are 2 steps to make this work. First, you need to add function policy that will allow s3 service to execute that function. Second, you will loop through the buckets one by one, subscribing lambda function to each one of them.
import boto3
s3_client = boto3.client("s3")
lambda_client = boto3.client('lambda')
buckets = ["test-bucket-jkg2", "test-bucket-x1gf"]
lambda_function_arn = "arn:aws:lambda:us-east-1:605189564693:function:my_func"
# create a function policy that will permit s3 service to
# execute this lambda function
# note that you should specify SourceAccount and SourceArn to limit who (which account/bucket) can
# execute this function - you will need to loop through the buckets to achieve
# this, at least you should specify SourceAccount
try:
response = lambda_client.add_permission(
FunctionName=lambda_function_arn,
StatementId="allow s3 to execute this function",
Action='lambda:InvokeFunction',
Principal='s3.amazonaws.com'
# SourceAccount="your account",
# SourceArn="bucket's arn"
)
print(response)
except Exception as e:
print(e)
# loop through all buckets and subscribe lambda function
# to each one of them
for bucket in buckets:
print("putting config to bucket: ", bucket)
try:
response = s3_client.put_bucket_notification_configuration(
Bucket=bucket,
NotificationConfiguration={
'LambdaFunctionConfigurations': [
{
'LambdaFunctionArn': lambda_function_arn,
'Events': [
's3:ObjectCreated:*'
]
}
]
}
)
print(response)
except Exception as e:
print(e)
You could write a custom resource to do this, in fact that's what I've ended up doing at work for the same problem. At the simplest level, define a lambda that takes a put bucket notification configuration and then just calls the put bucket notification api with the data that was passed it.
If you want to be able to control different notifications across different cloudformation templates, then it's a bit more complex. Your custom resource lambda will need to read the existing notifications from S3 and then update these based on what data was passed to it from CF.

Latest Lambda Layer ARN

I have a lambda layer which I keep updating. This lambda layer has multiple versions. How can I find the lambda layer ARN with latest version using aws cli?
I am able to do this using the command listed below -
aws lambda list-layer-versions --layer-name <layer name> --region us-east-1 --query 'LayerVersions[0].LayerVersionArn'
Unfortunately, it's currently not possible (I have encountered the same issue).
You can keep the latest ARN in your own place (like DynamoDB) and update it whenever you publish a new version of the layer.
You can create a custom macro to get the latest lambda layer version and use that as a reference.
The following function gets the latest version from the Lambda Layer stack:
import json
import boto3
def latest_lambdalayer(event, context):
fragment = get_latestversion(event['fragment'])
return {
'requestId': event['requestId'],
'status': 'success',
'fragment': fragment
}
def get_latestversion(fragment):
cloudformation = boto3.resource('cloudformation')
stack = cloudformation.Stack('ticketapp-layer-dependencies')
for o in stack.outputs:
if o['OutputKey']=='TicketAppLambdaDependency':
return o['OutputValue']
#return "arn:aws:lambda:eu-central-1:899885580749:layer:ticketapp-dependencies-layer:16"
And you use this when defining the Lambda layer—here using same global template:
Globals:
Function:
Layers:
- !Transform { "Name" : "LatestLambdaLayer"}
Runtime: nodejs12.x
MemorySize: 128
Timeout: 101

AWS CloudFormation & Service Catalog - Can I require tags with user values?

Our problem seems very basic and I would expect common.
We have tags that must always be applied (for billing). However, the tag values are only known at the time the stack is deployed... We don't know what the tag values will be when developing the stack, or when creating the product in the Service Catalog...
We don't want to wait until AFTER the resource is deployed to discover the tag is missing, so as cool as AWS config may be, we don't want to rely on its rules if we don't have to.
So things like Tag Options don't work, because it appears that they expect we know the tag value months prior to some deployment (which isn't the case.)
Is there any way to mandate tags be used for a cloudformation template when it is deployed? Better yet, can we have service catalog query for a tag value when deploying? Tags like "system" or "project", for instance, come and go over time and are not known up-front for many types of cloudformation templates we develop.
Isn't this a common scenario?
I am worried that I am missing something very, very simple and basic which mandates tags be used up-front, but I can't seem to figure out what. Thank you in advance. I really did Google a lot before asking, without finding a satisfying answer.
I don't know anything about service catalog but you can create Conditions and then use it to conditionally create (or even fail) your resource creation. Conditional Resource Creation e.g.
Parameters:
ResourceTag:
Type: String
Default: ''
Conditions:
isTagEmpty:
!Equals [!Ref ResourceTag, '']
Resources:
DBInstance:
Type: AWS::RDS::DBInstance
Condition: isTagEmpty
Properties:
DBInstanceClass: <DB Instance Type>
Here RDS DB instance will only be created if tag is non-empty. But cloudformation will still return success.
Alternatively, you can try & fail the resource creation.
Resources:
DBInstance:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceClass: !If [isTagEmpty, !Ref "AWS::NoValue", <DB instance type>]
I haven't tried this but it should fail as DB instance type will be invalid if tag is null.
Edit: You can also create your stack using the createStack CFN API. Write some code to read & validate the input (e.g. read from service catalog) & call the createStack API. I am doing the same from Lambda (nodejs) reading some input from Parameter Store. Sample code -
module.exports.create = async (event, context, callback) => {
let request = JSON.parse(event.body);
let subnetids = await ssm.getParameter({
Name: '/vpc/public-subnets'
}).promise();
let securitygroups = await ssm.getParameter({
Name: '/vpc/lambda-security-group'
}).promise();
let params = {
StackName: request.customerName, /* required */
Capabilities: [
'CAPABILITY_IAM',
'CAPABILITY_NAMED_IAM',
'CAPABILITY_AUTO_EXPAND',
/* more items */
],
ClientRequestToken: 'qwdfghjk3912',
EnableTerminationProtection: false,
OnFailure: request.onfailure,
Parameters: [
{
ParameterKey: "SubnetIds",
ParameterValue: subnetids.Parameter.Value,
},
{
ParameterKey: 'SecurityGroupIds',
ParameterValue: securitygroups.Parameter.Value,
},
{
ParameterKey: 'OpsPoolArnList',
ParameterValue: request.userPoolList,
},
/* more items */
],
TemplateURL: request.templateUrl,
};
cfn.config.region = request.region;
let result = await cfn.createStack(params).promise();
console.log(result);
}
Another option: add a AWS Custom Resource backed by Lambda. Check for tags in this section & return failure if it doesn't satisfy the constraints. Make all other resource creation depend on this resource (so that they all create if your checks pass). Link also contains example. You will also have to add handling for stack update & deletion (like a default success). I think this is your best bet as of now.