I am trying to create a data quality validation for set of files in s3. For that I have chose AWS data brew and have created a dataset, data quality rules
and a data profile job via SAM template.
Here, Once a dataset is created I have to refer the Arn of the dataset while creating the ruleset and also the Arn of ruleset for the profile job.
On checking documentation I can see that ARN is not part of outputs for the dataset and data quality rule set. So is it possible to dynamically refer these
values. Or should I create rulesets separately.
SampleDataSet:
Type: AWS::DataBrew::Dataset
Properties:
Name: SampleDataSet
Input:
S3InputDefinition:
Bucket: *****
Key: *****
SampleRuleSet:
Type: AWS::DataBrew::Ruleset
Properties:
Name: SampleRuleSet
Rules:
- Name: rule1
Disabled : true
CheckExpression: "AGG(DUPLICATE_ROWS_COUNT) <= :val1"
SubstitutionMap:
- Value: "0"
ValueReference: ":val1"
TargetArn: !GetAtt SampleDataSet.Arn
DependsOn: SampleDataSet
SampleProfileJob:
Type: AWS::DataBrew::Job
Properties:
Name: SampleProfileJob
Type: PROFILE
RoleArn: !GetAtt GenericDataBrewDataQualityRole.Arn
DatasetName: SampleDataSet
Timeout: 5
ValidationConfigurations:
- RulesetArn: !GetAtt SampleRuleSet.Arn
OutputLocation:
Bucket: *****
DependsOn: SampleRuleSet
Related
I'm stuck in a weird issue. I have created an AWS S3 bucket using following cloud formation template:-
AWSTemplateFormatVersion: '2010-09-09'
Metadata:
License: Unlicensed
Description: >
This template creates a global unique S3 bucket in a specific region which is unique.
The bucket name is formed by the environment, account id and region
Parameters:
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html
Environment:
Description: This paramenter will accept the environment details from the user
Type: String
Default: sbx
AllowedValues:
- sbx
- dev
- qa
- e2e
- prod
ConstraintDescription: Invalid environment. Please select one of the given environments only
Resources:
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket.html
MyS3Bucket:
Type: AWS::S3::Bucket
DeletionPolicy: Retain
Properties:
BucketName: !Sub 'global-bucket-${Environment}-${AWS::Region}-${AWS::AccountId}' # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/pseudo-parameter-reference.html
AccessControl: Private
LoggingConfiguration:
DestinationBucketName: !Ref 'LoggingBucket'
LogFilePrefix: 'access-logs'
Tags:
- Key: name
Value: globalbucket
- Key: department
Value: engineering
LoggingBucket:
Type: AWS::S3::Bucket
DeletionPolicy: Retain
Properties:
BucketName: !Sub 'global-loggings-${Environment}-${AWS::Region}-${AWS::AccountId}'
AccessControl: LogDeliveryWrite
Outputs:
GlobalS3Bucket:
Description: A private S3 bucket with deletion policy as retain and logging configuration
Value: !Ref MyS3Bucket
Export:
Name: global-bucket
If you note in the template above then I'm exporting this S3 bucket in the Outputs section by the name called global-bucket.
Now, my intention is to refer to this existing bucket going forward in my AWS account whenever any new resource like Lambda, etc wants an S3 bucket. Here is an example using AWS SAM (Serverless Application Model), I'm trying to create an AWS Lambda and trying to refer to this existing S3 bucket using property !ImportValue and the export name as global-bucket as shown below:-
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
hellolambda
Sample SAM Template for hellolambda
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
Function:
Timeout: 3
Resources:
HelloWorldFunction:
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
Properties:
CodeUri: hello-world/
Handler: app.lambdaHandler
Runtime: nodejs12.x
Events:
HelloLambdaEvent:
Type: S3
Properties:
Bucket: !Ref SrcBucket
Events: s3:ObjectCreated:*
SrcBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !ImportValue global-bucket
Now, the problem is when I execute the command like sam build and then sam deploy --guided and select the same region (where my previous CloudFormation stack output is present) then I get the following error:-
global-bucket-sbx-ap-southeast-1-088853283839 already exists in stack arn:aws:cloudformation:ap-southeast-1:088853283839:stack/my-s3-global-bucket/aabd20e0-f57d-11ea-80bf-06f1487f6a64
The screenshot below:-
The problem is AWS CloudFormation is trying to create the S3 bucket rather than referring to the existing one.
But, if I try to update this SAM template like and then execute sam deploy, I get the following error:-
Waiting for changeset to be created..
Error: Failed to create changeset for the stack: my-lambda-stack, ex: Waiter ChangeSetCreateComplete failed: Waiter encountered a terminal failure state Status: FAILED. Reason: Transform AWS::Serverless-2016-10-31 failed with: Invalid Serverless Application Specification document. Number of errors found: 1. Resource with id [HelloWorldFunction] is invalid. Event with id [HelloLambdaEvent] is invalid. S3 events must reference an S3 bucket in the same template.
I'm blocked by both ends. I would really appreciate it if someone can assist to guide me writing the SAM template correctly in my Lambda so that I can refer the existing bucket properly instead of creating the new one.
Thank you
Any items listed under the Resources section refer to the resources the stack is responsible for maintaining.
When you list SrcBucket you are asking for CloudFormation to create a new S3 bucket with the name being the value of !ImportValue global-bucket which is the name of an S3 bucket you have already created.
Assuming that this is the bucket name you can simply reference it in your template as shown below.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
hellolambda
Sample SAM Template for hellolambda
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
Function:
Timeout: 3
Resources:
HelloWorldFunction:
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
Properties:
CodeUri: hello-world/
Handler: app.lambdaHandler
Runtime: nodejs12.x
Events:
HelloLambdaEvent:
Type: S3
Properties:
Bucket: !ImportValue global-bucket
Events: s3:ObjectCreated:*
I have attached the sample to give you some clarity of what i am trying to solve.
AWSTemplateFormatVersion: '2010-09-09'
Description: Project Service Catalog get lambda data
Parameters:
Environment:
Type: String
Description: Environment of the SageMaker
ProjectId:
Type: String
Description: Project ID of the SageMaker
SsmRoleLambdaArn:
Type: AWS::SSM::Parameter::Value<String>
Default: '/data-science/role-lambda/arn'
Description: Arn to lookup Role of the Session using project id
Resource:
IdentifyUserRole:
Type: Custom::GetParam
Properties:
ServiceToken: !Ref SsmRoleLambdaArn
pl_role: !Sub '${Environment}-sso-data-science-${ProjectId}-pl-role'
ds_role: !Sub '${Environment}-sso-data-science-${ProjectId}-ds-role'
KmsKey:
Type: AWS::KMS::Key
Properties:
Description: !Sub 'Encryption for ${Environment}-${ProjectId}-${Prid}-${NotebookInstanceNameSuffix}'
EnableKeyRotation: true
Tags:
- Key: Environment
Value: !Ref Environment
- Key: Owner
Value: !Ref Owner
- Key: ProjectId
Value: !Ref ProjectId
- Key: PrincipalId
Value: !Sub
- "${RoleId}:${Prid}"
- RoleId:
Fn::If: [!Equals [!GetAtt IdentifyUserRole.value, true], !GetAtt PORoleId.value, !GetAtt DSRoleId.value]
I am getting error at the IF condition in the PrincipalID tag. Please help solve this condition with some sample templates. I can't use !GetAtt in the Conditions block as well because we are not supposed to use get attributes.
Error Message - During stack validation
An error occurred (ValidationError) when calling the ValidateTemplate operation: Template error: Fn::If requires a list argument with the first element being a condition
You can't hard code the condition in the If like you are attempting:
Fn::If: [!Equals [!GetAtt IdentifyUserRole.value, true], !GetAtt PORoleId.value, !GetAtt DSRoleId.value]
The first argument must be condition from Conditions section (docs):
reference to a condition in the Conditions section.
Subsequently, you can't construct conditions based on GetAtt or any other resources from Resources section.
The same docs also write:
You can only reference other conditions and values from the Parameters and Mappings sections of a template. For example, you can reference a value from an input parameter, but you cannot reference the logical ID of a resource in a condition.
I created a CloudFormation Stackset that deployed AWS Config Rules to two accounts. Now I want to create a stackset that deploys the Remediation. the bottom lines of code work when I have it all in one CFT. but I want to deploy te detection rules in one script first then the remediation rules second. How can I reference the S3BucketEncryptionEnabled Resource from a different scipt?
---------------------Detection --------------------------------------------------------
S3BucketEncryptionEnabled:
Type: AWS::Config::ConfigRule
Properties:
Description: Checks that your Amazon S3 bucket either has S3 default encryption enabled or that the S3 bucket policy explicitly denies put-object requests without server side encryption.
Source:
Owner: AWS
SourceIdentifier: S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED
Scope:
ComplianceResourceTypes:
- AWS::S3::Bucket
DependsOn: ConfigRecorder
----------------------Remediation Script-----------------------------------------------
BasicRemediationConfiguration:
Type: "AWS::Config::RemediationConfiguration"
Properties:
Automatic: True
MaximumAutomaticAttempts: 5
RetryAttemptSeconds: 60
ConfigRuleName: !Ref S3BucketEncryptionEnabled
Parameters:
AutomationAssumeRole:
StaticValue:
Values: [{"Fn::GetAtt" : ["S3Role","Arn"]}]
BucketName:
ResourceValue:
Value: RESOURCE_ID
SSEAlgorithm:
StaticValue:
Values: [AES256]
TargetId: "AWS-EnableS3BucketEncryption"
TargetType: "SSM_DOCUMENT"
TargetVersion: "1"
Normally, in your Detection template you would export the S3BucketEncryptionEnabled in your outputs.
For example:
Outputs:
S3BucketEncryptionEnabled:
Value: !Ref S3BucketEncryptionEnabled
Export:
Name: MyS3BucketEncryptionEnabled
Then in your Remediation template, you would use ImportValue to import the exported value.
For example:
BasicRemediationConfiguration:
Type: "AWS::Config::RemediationConfiguration"
Properties:
Automatic: True
MaximumAutomaticAttempts: 5
RetryAttemptSeconds: 60
ConfigRuleName: !ImportValue MyS3BucketEncryptionEnabled
# remaining properties
I have been searching for an example of how to set up Cloudformation for a glue workflow which includes triggers, jobs, and crawlers, but I haven't been able to find much information on it.
This is the only piece of information I am able to find from AWS
{
"Type" : "AWS::Glue::Workflow",
"Properties" : {
"DefaultRunProperties" : Json,
"Description" : String,
"Name" : String,
"Tags" : Json
}
}
Here's an example of a workflow with one crawler and a job to be run after the crawler finishes.
It is defined through tagging the triggers with the WorkflowName.
I believe there can be only one SCHEDULED or ON_DEMAND trigger to start the workflow. All the other triggers in the workflow need to be CONDITIONAL on the jobs / crawlers. That's probably how CloudFormation knows how to build the DAG.
Also see how the workflow parameters are defined as a json in the DefaultRunProperties.
---
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
BaseBucket:
Description: Bucket used by my workflow jobs
Type: String
Resources:
MyWorkflow:
Type: AWS::Glue::Workflow
Properties:
DefaultRunProperties:
{
"workflowParameter1": "Foo",
"workflowParameter2": "Bar",
"bucket": { "Fn::Sub": "${BaseBucket}" }
}
Description: Workflow for orchestrating my jobs
Name: MyWorkflowName
WorkflowCrawler:
Type: AWS::Glue::Crawler
Properties:
Name: MyCrawler
Role: MyCrawlerRole
Description: A crawler to run as the first step in the workflow
DatabaseName: MyDatabase
Targets:
S3Targets:
- Path: !Sub "s3://${BaseBucket}/"
WorkflowJob:
Type: AWS::Glue::Job
Properties:
Description: Glue job to run after the crawler
Name: MyWorkflowJob
Role: MyJobRole
Command:
Name: pythonshell
PythonVersion: 3
ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"
WorkflowStartTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: StartTrigger
Type: ON_DEMAND
Description: Trigger for starting the workflow
Actions:
- CrawlerName: !Ref WorkflowCrawler
WorkflowName: !Ref MyWorkflow
WorkflowJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: CrawlerSuccessfulTrigger
Type: CONDITIONAL
StartOnCreation: True
Description: Trigger to start the glue job
Actions:
- JobName: !Ref WorkflowJob
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref WorkflowCrawler
CrawlState: SUCCEEDED
WorkflowName: !Ref MyWorkflow
Here is an example of a Glue workflow using triggers, crawlers and a job to convert JSON to Parquet:
JSONtoParquetWorkflow:
Type: AWS::Glue::Workflow
Properties:
Name: json-to-parquet-workflow
Description: Workflow for orchestrating JSON to Parquet conversion
RawJSONCrawlerTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: raw-json-crawler-trigger
Description: Start crawler for raw JSON data
Type: ON_DEMAND
Actions:
- CrawlerName: !Ref RawJSONCrawler
JSONToParquetETLJobTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: json-to-parquet-etl-trigger
Description: Start JSON to Parquet ETL job
Type: CONDITIONAL
StartOnCreation: True
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref RawJSONCrawler
CrawlState: SUCCEEDED
Actions:
- JobName: !Ref JSONToParquetETLJob
RawParquetCrawlerTrigger:
Type: AWS::Glue::Trigger
Properties:
WorkflowName: !Ref JSONtoParquetWorkflow
Name: raw-parquet-crawler-trigger
Description: Start crawler for raw Parquet data
Type: CONDITIONAL
StartOnCreation: True
Predicate:
Conditions:
- LogicalOperator: EQUALS
JobName: !Ref JSONToParquetETLJob
State: SUCCEEDED
Actions:
- CrawlerName: !Ref RawParquetCrawler
There is no simple example to find, therefore I created an example AWS Glue Workflow: Getting started which is using AWS Cloudformation template. This example is very easy but explained with diagrams.
I am new to AWS cloudformation and in need to create a Kinesis datastream, then write records to this stream using python code. I was able to create a data stream through cloudformation template but not able to set the permissions. How I will attache a permission to allow certain usergroup to write to this kinesis data stream using the python library?
My current template code is,
AWSTemplateFormatVersion: '2010-09-09'
Description: 'This template will create an AWS Kinesis DataStream'
Parameters:
CFNStreamName:
Description: This will be used to name the Kinesis DataStream
Type: String
Default: 'data-stream'
CFNRetensionHours:
Description: This will be used to set the retension hours
Type: Number
Default: 168
CFNShardCount:
Description: This will be used to set the shard count
Type: Number
Default: 2
Resources:
MongoCDCStream:
Type: AWS::Kinesis::Stream
Properties:
Name: !Ref CFNStreamName
RetentionPeriodHours: !Ref CFNRetensionHours
ShardCount: !Ref CFNShardCount
StreamEncryption:
EncryptionType: KMS
KeyId: alias/aws/kinesis
Outputs:
MongoCDCStream:
Value: !Ref MongoCDCStream
Export:
Name: !Sub ${AWS::StackName}-MongoCDCStream
You will want to pass in (through the cloudformation parameter) either the IAM Role or User that your Python code runs on.
Inside the template, create an IAM Policy or ManagedPolicy that attaches to the IAM Role / User you passed in and assign the correct permission.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'This template will create an AWS Kinesis DataStream'
Parameters:
CFNStreamName:
Description: This will be used to name the Kinesis DataStream
Type: String
Default: 'data-stream'
CFNRetensionHours:
Description: This will be used to set the retension hours
Type: Number
Default: 168
CFNShardCount:
Description: This will be used to set the shard count
Type: Number
Default: 2
PythonCodeRole:
Type: String
# ^- Pass in role here.
Resources:
# Assign permission here.
PythonCodePlicyAssignmen:
Type: AWS::IAM::Policy
Properties:
PolicyDocument:
<assign needed permission here>
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Action:
- "kinesis:*"
Resource: !Ref MongoCDCStream
# ^- here, use !Ref to tie in the correct resource id cleanly.
PolicyName: python-code-permission
Roles: [!Ref PythonCodeRole]
MongoCDCStream:
Type: AWS::Kinesis::Stream
Properties:
Name: !Ref CFNStreamName
RetentionPeriodHours: !Ref CFNRetensionHours
ShardCount: !Ref CFNShardCount
StreamEncryption:
EncryptionType: KMS
KeyId: alias/aws/kinesis
Outputs:
MongoCDCStream:
Value: !Ref MongoCDCStream
Export:
Name: !Sub ${AWS::StackName}-MongoCDCStream