I have already setup a glue crawler successfully on the AWS console.
Now I have a Cloudformation template to mimic the whole process, EXCEPT I cannot add the Exclusions: field to the template. Background: From the AWS Glue API, the Exclusions: field represent glob patterns to exclude files or folders matching a specific pattern within the data store, in my example, an S3 data store.
With much effort I cannot get the glob patterns to populate on the glue crawler console despite all other values from the script populating alongside the crawler configuration, i.e. the S3Target, crawler name, IAM role, and grouping behavior, all these glue settings/fields populate successfully from the CFN template, all except the Exclusions field, also known as exclude patterns on the Glue Console. My CFN template passes validation and I've run the crawler hoping the exclude globs albeit hidden would somehow still have an affect, but unfortunately I cannot seem to populate the Exclusions field?
Here's the S3Target Exclusion AWS Glue API guide
Here's an AWS sample YAML CFN for a Glue Crawler
Here's a helpful YAML string array guide
YAML
CFNCrawlerSecDeraNUM:
Type: AWS::Glue::Crawler
Properties:
Name: !Ref CFNCrawlerName
Role: !GetAtt CFNRoleSecDERA.Arn
#Classifiers: none, use the default classifier
Description: AWS Glue crawler to crawl SecDERA data
#Schedule: none, use default run-on-demand
DatabaseName: !Ref CFNDatabaseName
Targets:
S3Targets:
- Exclusions:
- "*/readme.htm"
- "*/sub.txt"
- "*/pre.txt"
- "*/tag.txt"
- Path: "s3://sec-input"
TablePrefix: !Ref CFNTablePrefixName
SchemaChangePolicy:
UpdateBehavior: "UPDATE_IN_DATABASE"
DeleteBehavior: "LOG"
# Added single schema grouping Glue API option
Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
JSON
"CFNCrawlerSecDeraNUM": {
"Type": "AWS::Glue::Crawler",
"Properties": {
"Name": {
"Ref": "CFNCrawlerName"
},
"Role": {
"Fn::GetAtt": [
"CFNRoleSecDERA",
"Arn"
]
},
"Description": "AWS Glue crawler to crawl SecDERA data",
"DatabaseName": {
"Ref": "CFNDatabaseName"
},
"Targets": {
"S3Targets": [
{
"Exclusions": [
"*/readme.htm",
"*/sub.txt",
"*/pre.txt",
"*/tag.txt"
]
},
{
"Path": "s3://sec-input"
}
]
},
"TablePrefix": {
"Ref": "CFNTablePrefixName"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
},
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
}
}
You are passing Exclusions as a new S3Target object to the S3Targets list.
Try change this:
Targets:
S3Targets:
- Exclusions:
- "*/readme.htm"
- "*/sub.txt"
- "*/pre.txt"
- "*/tag.txt"
- Path: "s3://sec-input"
To this:
Targets:
S3Targets:
- Path: "s3://sec-input"
Exclusions:
- "*/readme.htm"
- "*/sub.txt"
- "*/pre.txt"
- "*/tag.txt"
Related
My goal is to enable logging for a regional WebAcl via AWS CDK. This seems to be possible over Cloud Formation and there are the appropriate constructs in CDK. But when using the following code to create a Log Group and linking it in a LoggingConfiguration ...
const webAclLogGroup = new LogGroup(scope, "awsWafLogs", {
logGroupName: `aws-waf-logs`
});
// Create logging configuration with log group as destination
new CfnLoggingConfiguration(scope, "webAclLoggingConfiguration", {
logDestinationConfigs: webAclLogGroup.logGroupArn, // Arn of LogGroup
resourceArn: aclArn // Arn of Acl
});
... I get an exception during cdk deploy, stating that the string in the LogdestinationConfig is not a correct Arn (some parts of the Arn in the log messages have been removed):
Resource handler returned message: "Error reason: The ARN isn't valid. A valid ARN begins with arn: and includes other information separated by colons or slashes., field: LOG_DESTINATION, parameter: arn:aws:logs:xxx:xxx:xxx-awswaflogsF99ED1BA-PAeH9Lt2Y3fi:* (Service: Wafv2, Status Code: 400, Request ID: xxx, Extended Request ID: null)"
I cannot see an error in the generated Cloud Formation code after cdk synth:
"webAclLoggingConfiguration": {
"id": "webAclLoggingConfiguration",
"path": "xxx/xxx/webAclLoggingConfiguration",
"attributes": {
"aws:cdk:cloudformation:type": "AWS::WAFv2::LoggingConfiguration",
"aws:cdk:cloudformation:props": {
"logDestinationConfigs": [
{
"Fn::GetAtt": [
{
"Ref": "awsWafLogs58D3FD01"
},
"Arn"
]
}
],
"resourceArn": {
"Fn::GetAtt": [
"webACL",
"Arn"
]
}
}
},
"constructInfo": {
"fqn": "aws-cdk-lib.aws_wafv2.CfnLoggingConfiguration",
"version": "2.37.1"
}
},
I'm using Cdk with Typescript and the Cdk version is currently set to 2.37.1 but it also did not work with 2.16.0.
WAF has particular requirements to the naming and format of Logging Destination configs as described and shown in their docs.
Specifically, the ARN of the Log Group cannot end in :* which unfortunately is the return value for a Log Group ARN in Cloudformation.
A workaround would be to construct the required ARN format manually like this, which will omit the :* suffix. Also note that logDestinationConfigs takes a List of Strings, though only with exactly 1 element in it.
const webAclLogGroup = new LogGroup(scope, "awsWafLogs", {
logGroupName: `aws-waf-logs`
});
// Create logging configuration with log group as destination
new CfnLoggingConfiguration(scope, "webAclLoggingConfiguration", {
logDestinationConfigs: [
// Construct the different ARN format from the logGroupName
Stack.of(this).formatArn({
arnFormat: ArnFormat.COLON_RESOURCE_NAME,
service: "logs",
resource: "log-group",
resourceName: webAclLogGroup.logGroupName,
})
],
resourceArn: aclArn // Arn of Acl
});
PS: I work for AWS on the CDK team.
I need to write a CFT for pipeline with Jenkins integration for build/test. I found this documentation to setup ActionTypeId for the jenkins stage. But this doc does not specify how to set the server url of the jenkins server. And also it is not clear to me where to give the Jenkins Provider name. Is it in the ActionTypeId or in configuration properties?
I could not find any example for this use case in the internet as well.
Please provide a proper example for setup Jenkins Action Provider for AWS Codepipeline using AWS Cloudformation template.
Following is a section from the sample cft I wrote from what I learnt from above doc.
"stages": [
{
"name": "Jenkins",
"actions": [
...
{
"name": "Jenkins Build",
"actionTypeId": {
"category": "Build",
"owner": "Custom",
"provider": "Jenkins",
"version": "1"
},
"runOrder": 2,
"configuration": {
???
},
...
}
]
},
...
]
The piece of information which was missing for me was that I need to create a Custom Action to use Jenkins as the Action provider for my codepipeline.
First I added the custom action as below:
JenkinsCustomActionType:
Type: AWS::CodePipeline::CustomActionType
Properties:
Category: Build
Provider: !Ref JenkinsProviderName
Version: 1
ConfigurationProperties:
-
Description: "The name of the build project must be provided when this action is added to the pipeline."
Key: true
Name: ProjectName
Queryable: false
Required: true
Secret: false
Type: String
InputArtifactDetails:
MaximumCount: 5
MinimumCount: 0
OutputArtifactDetails:
MaximumCount: 5
MinimumCount: 0
Settings:
EntityUrlTemplate: !Join ['', [!Ref JenkinsServerURL, "/job/{Config:ProjectName}/"]]
ExecutionUrlTemplate: !Join ['', [!Ref JenkinsServerURL, "/job/{Config:ProjectName}/{ExternalExecutionId}/"]]
Tags:
- Key: Name
Value: custom-jenkins-action-type
The jenkins server URL is given in the settings for Custom Action
and the Jenkins provider name is given for Provider. Which were the
issues I had initially.
Then configure the pipeline stage as following:
DevPipeline:
Type: AWS::CodePipeline::Pipeline
DependsOn: JenkinsCustomActionType
Properties:
Name: Dev-CodePipeline
RoleArn:
Fn::GetAtt: [ CodePipelineRole, Arn ]
Stages:
...
- Name: DevBuildVerificationTest
Actions:
- Name: JenkinsDevBVT
ActionTypeId:
Category: Build
Owner: Custom
Version: 1
Provider: !Ref JenkinsProviderName
Configuration:
# JenkinsDevBVTProjectName - Jenkins Job name defined as a parameter in the CFT
ProjectName: !Ref JenkinsDevBVTProjectName
RunOrder: 4
The Custom Action has to be created before the Pipeline. Hence DependsOn: JenkinsCustomActionType
I've been surprised to find out that file deletion was not replicated in a S3 bucket Cross Region Replication situation, running this simple test:
simplest configuration of a CRR
upload a new file
check it is replicated
delete the file (not a version of the file)
So I checked the documentation and I find this statement:
If you delete an object from the source bucket, the following occurs:
If you make a DELETE request without specifying an object version ID, Amazon S3 adds a delete marker. Amazon S3 deals with the delete
marker as follows:
If using latest version of the replication configuration, that is you specify the Filter element in a replication configuration rule,
Amazon S3 does not replicate the delete marker.
If don't specify the Filter element, Amazon S3 assumes replication configuration is a prior version V1. In the earlier
version, Amazon S3 handled replication of delete markers differently.
For more information, see Backward Compatibility .
The later link to backward compat tell me that:
When you delete an object from your source bucket without specifying an object version ID, Amazon S3 adds a delete marker. If you use V1 of the replication configuration XML, Amazon S3 replicates delete markers that resulted from user actions.[...]
In V2, Amazon S3 doesn't replicate delete markers and therefore you must set the DeleteMarkerReplication element to Disabled.
So if I sum this up:
CRR configuration is considered v1 if there is no Filter
with CRR configuration v1, file deletion is replicated, not with v2
Well, this is my configuration :
{
"ReplicationConfiguration": {
"Role": "arn:aws:iam::271226720751:role/service-role/s3crr_role_for_mybucket_to_myreplica",
"Rules": [
{
"ID": "first replication rule",
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::myreplica"
}
}
]
}
}
And deletion is not replicated. So it makes me think that my configuration is still considered V2 (even if I have no filter).
So, can someone confirm this presumption?
And could someone tell me what does:
In V2, Amazon S3 doesn't replicate delete markers and therefore you must set the DeleteMarkerReplication element to Disabled
really mean?
There are two different configuration when replicating delete marker, V1 and V2.
Currently, when you enable S3 Replication (CRR or SRR) from the console, V2 configuration is enabled by default. However, if your use case requires you to delete replicated objects whenever they are deleted from the source bucket, you need the V1 configuration
Here is the difference between V1 and V2:
V1 configuration
The delete marker is replicated (V1 configuration). A subsequent GET request to the deleted object in both the source and the destination bucket does not return the object.
V2 configuration
The delete marker is not replicated (V2 configuration). A subsequent GET request to the deleted object returns the object only in the destination bucket.
To enable V1 configuration (to replicate delete marker), use the policy below. Keep in mind that certain replication features such as tag-based filtering and Replication Time Control (RTC) that are only available in V2 configurations.
{
"Role": " IAM-role-ARN ",
"Rules": [
{
"ID": "Replication V1 Rule",
"Prefix": "",
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::<destination-bucket>"
}
}
]
}
Here is the blog that describes these behavior in details:
https://aws.amazon.com/blogs/storage/managing-delete-marker-replication-in-amazon-s3/
I have seen exactly the same behaviour. I was unable to create a v1 situation to get DeleteMarker replication to occur.
The issue comes from still not clear documentation from AWS.
To use DeleteMarkerReplication, you need V1 of the configuration. To let AWS know that you want V1, you need to specify a Prefix element in your configuration, and no DeleteMarkerReplication element, so your first try was almost correct.
{
"ReplicationConfiguration": {
"Role": "arn:aws:iam::271226720751:role/service-role/s3crr_role_for_mybucket_to_myreplica",
"Rules": [
{
"ID": "first replication rule",
"Prefix": "",
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::myreplica"
}
}
]
}
}
And of course you need the s3:ReplicateDelete permission in your policy.
I believe I've figured this out. It looks like whether the Delete Markers are replicated or not depends on the permissions in the Replication Role.
If your replication role has the permission s3:ReplicateDelete on the destination, then Delete Markers will be replicated. If if does not have that permission they are not.
Below is the Cloudformation YAML for my Replication role with the ReplicateDelete permission commented out as an example. With this setup it does not replicate Delete Markers, uncomment the permission and it will. Note the permissions is based on what AWS actually creates if you set up the replication via the console (and they differ slightly from those in the documentation).
ReplicaRole:
Type: AWS::IAM::Role
Properties:
#Path: "/service-role/"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- s3.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: "replication-policy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Resource:
- !Sub "arn:aws:s3:::${LiveBucketName}"
- !Sub "arn:aws:s3:::${LiveBucketName}/*"
Action:
- s3:Get*
- s3:ListBucket
- Effect: Allow
Resource: !Sub "arn:aws:s3:::${LiveBucketName}-replica/*"
Action:
- s3:ReplicateObject
- s3:ReplicateTags
- s3:GetObjectVersionTagging
#- s3:ReplicateDelete
Adding a comment as an answer because I cannot comment to #john-eikenberry's answer. I have tested answer suggested by John (Action "s3:ReplicateDelete") but it is not working.
Edit: A failed attempt:
I have also tried to put bucket replication with delete marker enabled but it failed. Error message is:
An error occurred (MalformedXML) when calling the PutBucketReplication
operation: The XML you provided was not well-formed or did not
validate against our published schema
Experiment details:
Existing replication configuration:
aws s3api get-bucket-replication --bucket my-source-bucket > my-source-bucket.json
{
"Role": "arn:aws:iam::account-number:role/s3-cross-region-replication-role",
"Rules": [
{
"ID": " s3-cross-region-replication-role",
"Priority": 1,
"Filter": {},
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::my-destination-bucket"
},
"DeleteMarkerReplication": {
"Status": "Disabled"
}
}
]
}
aws s3api put-bucket-replication --bucket my-source-bucket --replication-configuration file://my-source-bucket-updated.json
{
"Role": "arn:aws:iam::account-number:role/s3-cross-region-replication-role",
"Rules": [
{
"ID": " s3-cross-region-replication-role",
"Priority": 1,
"Filter": {},
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::my-destination-bucket"
},
"DeleteMarkerReplication": {
"Status": "Enabled"
}
}
]
}
Trying to create a cloud formation template to configure WAF with geo location condition. Couldnt find the right template yet. Any pointers would be appreciated.
http://docs.aws.amazon.com/waf/latest/developerguide/web-acl-geo-conditions.html
Unfortunately, the actual answer (as of this writing, July 2018) is that you cannot create geo match sets directly in CloudFormation. You can create them via the CLI or SDK, then reference them in the DataId field of a WAFRule's Predicates property.
Creating a GeoMatchSet with one constraint via CLI:
aws waf-regional get-change-token
aws waf-regional create-geo-match-set --name my-geo-set --change-token <token>
aws waf-regional get-change-token
aws waf-regional update-geo-match-set --change-token <new_token> --geo-match-set-id <id> --updates '[ { "Action": "INSERT", "GeoMatchConstraint": { "Type": "Country", "Value": "US" } } ]'
Now reference that GeoMatchSet id in the CloudFormation:
"WebAclGeoRule": {
"Type": "AWS::WAFRegional::Rule",
"Properties": {
...
"Predicates": [
{
"DataId": "00000000-1111-2222-3333-123412341234" // id from create-geo-match-set
"Negated": false,
"Type": "GeoMatch"
}
]
}
}
There is no documentation for it, but it is possible to create the Geo Match in serverless/cloudformation.
Used the following in serverless:
Resources:
Geos:
Type: "AWS::WAFRegional::GeoMatchSet"
Properties:
Name: geo
GeoMatchConstraints:
- Type: "Country"
Value: "IE"
Which translated to the following in cloudformation:
"Geos": {
"Type": "AWS::WAFRegional::GeoMatchSet",
"Properties": {
"Name": "geo",
"GeoMatchConstraints": [
{
"Type": "Country",
"Value": "IE"
}
]
}
}
That can then be referenced when creating a rule:
(serverless) :
Resources:
MyRule:
Type: "AWS::WAFRegional::Rule"
Properties:
Name: waf
Predicates:
- DataId:
Ref: "Geos"
Negated: false
Type: "GeoMatch"
(cloudformation) :
"MyRule": {
"Type": "AWS::WAFRegional::Rule",
"Properties": {
"Name": "waf",
"Predicates": [
{
"DataId": {
"Ref": "Geos"
},
"Negated": false,
"Type": "GeoMatch"
}
]
}
}
I'm afraid that your question is too vague to solicit a helpful response. The CloudFormation User Guide (pdf) defines many different WAF / CloudFront / R53 resources that will perform various forms of geo match / geo blocking capabilities. The link you provide seems a subset of Web Access Control Lists (Web ACL) - see AWS::WAF::WebACL on page 2540.
I suggest you have a look and if you are still stuck, actually describe what it is you are trying to achieve.
Note that the term you used: "geo location condition" doesn't directly relate to an AWS capability that I'm aware of.
Finally, if you are referring to https://aws.amazon.com/about-aws/whats-new/2017/10/aws-waf-now-supports-geographic-match/, then the latest Cloudformation User Guide doesn't seem to have been updated yet to reflect this.
I'm trying to confgure a dashboard with a basic widget to expose CpUUtilization metric.
I cannot reference the previous created EC2 instance, since it seems that in the json that describe the dashboard the !Ref function is not interpreted.
metrics": [
"AWS/EC2",
"CPUUtilization",
"InstanceId",
"!Ref Ec2Instance"
]
Any idea how to reference it by logical name?
You can use Fn::Join to combine the output of Intrinsic functions (like Ref) with strings. For example:
CloudWatchDashboardHOSTNAME:
Type: "AWS::CloudWatch::Dashboard"
DependsOn: Ec2InstanceHOSTNAME
Properties:
DashboardName: HOSTNAME
DashboardBody: { "Fn::Join": [ "", ['{"widgets":[
{
"type":"metric",
"properties":{
"metrics":[
["AWS/EC2","CPUUtilization","InstanceId",
"', { Ref: Ec2InstanceHOSTNAME }, '"]
],
"title":"CPU Utilization",
"period":60,
"region":"us-east-1"
}
}]}' ] ] }
Documentation:
Fn::Join - AWS CloudFormation
Ref - AWS CloudFormation
AWS::CloudWatch::Dashboard - AWS CloudFormation
Dashboard Body Structure and Syntax - Amazon CloudWatch