aws glue create-crawler fails on Configuration settings - amazon-web-services

When running the following
aws glue create-crawler --debug --cli-input-json '{
"Name": "crawler",
"Role": "arn:...",
"DatabaseName": "db",
"Description": "table crawler",
"Targets": {
"CatalogTargets": [{
"DatabaseName": "db",
"Tables": ["tab"]
}]
},
"SchemaChangePolicy": {
"UpdateBehavior": "LOG",
"DeleteBehavior": "LOG"
},
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"Configuration": {
"Version": 1.0,
"CrawlerOutput": {
"Partitions": { "AddOrUpdateBehavior": "InheritFromTable" }
},
"Grouping": { "TableGroupingPolicy": "CombineCompatibleSchemas" }
},
"Schedule": "Cron(1 * * * ? *)"
}'
It fails on
Parameter validation failed:
Invalid type for parameter Configuration, value: {'Version': 1.0, 'CrawlerOutput': {'Partitions': {'AddOrUpdateBehavior': 'InheritFromTable'}}, 'Grouping': {'TableGroupingPolicy': 'CombineCompatibleSchemas'}}, type: <class 'dict'>, valid types: <class 'str'>
I'm getting the format from here and if I remove Configuration it works. I tried all kind of quoting options to make this dictionary into one string but everything fails. Would love some help spotting the issue.

The solution is to move the configuration to an argument in that command
aws glue create-crawler --configuration '{
"Version": 1.0,
"CrawlerOutput": {
"Partitions": { "AddOrUpdateBehavior": "InheritFromTable" }
},
"Grouping": { "TableGroupingPolicy": "CombineCompatibleSchemas" }
}' --debug --cli-input-json '{
"Name": "crawler",
"Role": "arn:...",
"DatabaseName": "db",
"Description": "table crawler",
"Targets": {
"CatalogTargets": [{
"DatabaseName": "db",
"Tables": ["tab"]
}]
},
"SchemaChangePolicy": {
"UpdateBehavior": "LOG",
"DeleteBehavior": "LOG"
},
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"Schedule": "Cron(1 * * * ? *)"
}'

Related

Amplify Push erroring out when updating CustomResources.json

I'm building a geospatial search on properties using AWS Amplify and ElasticSearch.
I'm currently following this guide: https://gerard-sans.medium.com/finding-the-nearest-locations-around-you-using-aws-amplify-part-2-ce4603605be6
I set up my model as follows
type Property #model #searchable #auth(rules: [{allow: public}]) {
id: ID!
...
Loc: Coord!
}
type Coord {
lon: Float!
lat: Float!
}
I also added a custom Query:
type Query {
nearbyProperties(
location: LocationInput!,
m: Int,
limit: Int,
nextToken: String
): ModelPropertyConnection
}
input LocationInput {
lat: Float!
lon: Float!
}
type ModelPropertyConnection {
items: [Property]
total: Int
nextToken: String
}
I added resolvers for request and response:
## Query.nearbyProperties.req.vtl
## Objects of type Property will be stored in the /property index
#set( $indexPath = "/property/doc/_search" )
#set( $distance = $util.defaultIfNull($ctx.args.m, 500) )
#set( $limit = $util.defaultIfNull($ctx.args.limit, 10) )
{
"version": "2017-02-28",
"operation": "GET",
"path": "$indexPath.toLowerCase()",
"params": {
"body": {
"from" : 0,
"size" : ${limit},
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "${distance}m",
"Loc" : $util.toJson($ctx.args.location)
}
}
}
},
"sort": [{
"_geo_distance": {
"Loc": $util.toJson($ctx.args.location),
"order": "asc",
"unit": "m",
"distance_type": "arc"
}
}]
}
}
}
and response:
## Query.nearbyProperties.res.vtl
#set( $items = [] )
#foreach( $entry in $context.result.hits.hits )
#if( !$foreach.hasNext )
#set( $nextToken = "$entry.sort.get(0)" )
#end
$util.qr($items.add($entry.get("_source")))
#end
$util.toJson({
"items": $items,
"total": $ctx.result.hits.total,
"nextToken": $nextToken
})
And now the CustomStacks.json:
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "An auto-generated nested stack.",
"Metadata": {},
"Parameters": {
"AppSyncApiId": {
"Type": "String",
"Description": "The id of the AppSync API associated with this project."
},
"AppSyncApiName": {
"Type": "String",
"Description": "The name of the AppSync API",
"Default": "AppSyncSimpleTransform"
},
"env": {
"Type": "String",
"Description": "The environment name. e.g. Dev, Test, or Production",
"Default": "NONE"
},
"S3DeploymentBucket": {
"Type": "String",
"Description": "The S3 bucket containing all deployment assets for the project."
},
"S3DeploymentRootKey": {
"Type": "String",
"Description": "An S3 key relative to the S3DeploymentBucket that points to the root\nof the deployment directory."
}
},
"Resources": {
"QueryNearbyProperties": {
"Type": "AWS::AppSync::Resolver",
"Properties": {
"ApiId": { "Ref": "AppSyncApiId" },
"DataSourceName": "ElasticSearchDomain",
"TypeName": "Query",
"FieldName": "nearbyProperties",
"RequestMappingTemplateS3Location": {
"Fn::Sub": [
"s3://${S3DeploymentBucket}/${S3DeploymentRootKey}/resolvers/Query.nearbyProperties.req.vtl", {
"S3DeploymentBucket": { "Ref": "S3DeploymentBucket" },
"S3DeploymentRootKey": { "Ref": "S3DeploymentRootKey" }
}]
},
"ResponseMappingTemplateS3Location": {
"Fn::Sub": [ "s3://${S3DeploymentBucket}/${S3DeploymentRootKey}/resolvers/Query.nearbyProperties.res.vtl", {
"S3DeploymentBucket": { "Ref": "S3DeploymentBucket" },
"S3DeploymentRootKey": { "Ref": "S3DeploymentRootKey" }
}]
}
}
}
},
"Conditions": {
"HasEnvironmentParameter": {
"Fn::Not": [
{
"Fn::Equals": [
{
"Ref": "env"
},
"NONE"
]
}
]
},
"AlwaysFalse": {
"Fn::Equals": ["true", "false"]
}
},
"Outputs": {
"EmptyOutput": {
"Description": "An empty output. You may delete this if you have at least one resource above.",
"Value": ""
}
}
}
But when i try to amplify push, it does not work. Something about: Resource is not in the state stackUpdateComplete
Any help?
You could take a look in cloudformation at the resource. You're instance is probably stuck in update. Go to : Cloudformation, select your instance (or uncheck view nested first) and go to the events tab. There you will probably find a reason why the instance can't update.
If it's stuck, cancel within the stack actions.

Cannot run AWS Data Pipeline job due to ListObjectsV2 operation: Access Denied

I've written some CDK code to programmatically create a data pipeline that backs up a DynamoDB table into an S3 bucket on a daily basis.
But it keeps running into this error:
amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:750) Caused by:
....
fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied at amazonaws.datapipeline.activity.mapreduce.HadoopJobRunner.executeCommand(HadoopJobRunner.java:175) at amazonaws.datapipeline.activity.mapreduce.HadoopJobRunner.ex
I'm using the DataPipelineDefaultResourceRole and DataPipelineDefaultRole for this data pipeline which has S3:* permission, so I'm puzzled why this is happening.
On top of that, I'm not sure why logging is not enabled on my EMR cluster which is spun off by this data pipeline although I've specified the logLocation parameter: myLogUri
Any pointers please?
import { CfnPipeline } from "monocdk/aws-datapipeline";
private createDataPipeline(props: InfrastructureStackProps) {
const dataPipelineName = "a-nice-datapipeline8.23";
const pipeline = new CfnPipeline(this, dataPipelineName, {
name: dataPipelineName,
parameterObjects: [
{
id: "myDDBTableName",
attributes: [
{
key: "Description",
stringValue: "Source table"
},
{
key: "Type",
stringValue: "String"
},
{
key: "Default",
stringValue: "Attributes"
}
]
},
{
id: "myOutputS3Location",
attributes: [
{
key: "Description",
stringValue: "Output S3 Location"
},
{
key: "Type",
stringValue: "String"
},
{
key: "Default",
stringValue: "s3://ddb-table-backup/"
}
]
},
{
id: "myDdbReadThroughputRatio",
attributes: [
{
key: "Description",
stringValue: "DynamoDB Read Throughput Ratio"
},
{
key: "Type",
stringValue: "Double"
},
{
key: "Default",
stringValue: "0.15"
}
]
},
{
id: 'myLogUri',
attributes: [
{
key: 'type',
stringValue: 'AWS::S3::ObjectKey',
},
{
key: 'description',
stringValue: 'DataPipeline Log Uri',
},
],
},
{
id: "myDDBRegion",
attributes: [
{
key: "Description",
stringValue: "Region of the DynamoDB Table"
},
{
key: "Type",
stringValue: "String"
},
{
key: "Default",
stringValue: props.region
}
]
}
],
parameterValues: [
{
id: "myDDBTableName",
stringValue: "Attributes"
},
{
id: "myOutputS3Location",
stringValue: "s3://ddb-table-backup/"
},
{
id: "myDdbReadThroughputRatio",
stringValue: "0.15"
},
{
id: 'myLogUri',
stringValue: `s3://data_pipeline_log/`,
},
{
id: "myDDBRegion",
stringValue: props.region
}
],
pipelineObjects: [
{
"id": "EmrClusterForBackup",
"name": "EmrClusterForBackup",
"fields": [
{
"key": "resourceRole",
"stringValue": "DataPipelineDefaultResourceRole"
},
{
"key": "role",
"stringValue": "DataPipelineDefaultRole"
},
{
"key": "coreInstanceCount",
"stringValue": "1"
},
{
"key": "coreInstanceType",
"stringValue": "m4.xlarge"
},
{
"key": "releaseLabel",
"stringValue": "emr-5.29.0"
},
{
"key": "masterInstanceType",
"stringValue": "m4.xlarge"
},
{
"key": "region",
"stringValue": props.region
},
{
"key": "type",
"stringValue": "EmrCluster"
},
{
"key": "terminateAfter",
"stringValue": "2 Hours"
}
]
},
{
"id": "S3BackupLocation",
"name": "S3BackupLocation",
"fields": [
{
"key": "directoryPath",
"stringValue": "s3://ddb-table-backup/"
},
{
"key": "type",
"stringValue": "S3DataNode"
}
]
},
{
"id": "DDBSourceTable",
"name": "DDBSourceTable",
"fields": [
{
"key": "readThroughputPercent",
"stringValue": "0.15"
},
{
"key": "type",
"stringValue": "DynamoDBDataNode"
},
{
"key": "tableName",
"stringValue": "Attributes"
}
]
},
{
"id": "Default",
"name": "Default",
"fields": [
{
"key": "failureAndRerunMode",
"stringValue": "CASCADE"
},
{
"key": "resourceRole",
"stringValue": "DataPipelineDefaultResourceRole"
},
{
"key": "role",
"stringValue": "DataPipelineDefaultRole"
},
{
"key": "scheduleType",
"stringValue": "cron"
},
{
key: 'schedule',
refValue: 'DailySchedule'
},
{
key: 'pipelineLogUri',
stringValue: 's3://data_pipeline_log/',
},
{
"key": "type",
"stringValue": "Default"
}
]
},
{
"name": "Every 1 day",
"id": "DailySchedule",
"fields": [
{
"key": 'type',
"stringValue": 'Schedule'
},
{
"key": 'period',
"stringValue": '1 Day'
},
{
"key": 'startDateTime',
"stringValue": "2021-12-20T00:00:00"
}
]
},
{
"id": "TableBackupActivity",
"name": "TableBackupActivity",
"fields": [
{
"key": "type",
"stringValue": "EmrActivity"
},
{
"key": "output",
"refValue": "S3BackupLocation"
},
{
"key": "input",
"refValue": "DDBSourceTable"
},
{
"key": "maximumRetries",
"stringValue": "2"
},
{
"key": "preStepCommand",
"stringValue": "(sudo yum -y update aws-cli) && (aws s3 rm #{output.directoryPath} --recursive)"
},
{
"key": "step",
"stringValue": "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
},
{
"key": "runsOn",
"refValue": "EmrClusterForBackup"
},
{
"key": "resizeClusterBeforeRunning",
"stringValue": "false"
}
]
}
],
activate: true
});
return pipeline;
}
I may be avoiding the direct issue at hand here, but I'm curious to know why you are using DataPipeline for this? You would probably be better served using AWS Backup which will allow you to take periodic backups in a manged fashion as well as other features such as expiring backups or sending to cold storage.
On the particular issue at hand, please check that your S3 bucket does not have a resource based policy blocking you: https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html
Also check the EC2 role used for your DP, commonly called AmazonEC2RoleforDataPipelineRole. More info on IAM roles for DP here.

admin-create-user command doesn't work properly

I'm trying to run admin-create-user cli command as shown in the official doc, but it doesn't seems to run properly.
I don't get all attributes created event though they were in the command. I always get only the last attribute typed in the command.
am I doing something wrong? is there any solution?
aws cognito-idp admin-create-user --user-pool-id us-west-2_aaaaaaaaa --username diego#example.com --user-attributes=Name=email,Value=kermit2#somewhere.com,Name=phone_number,Value="+15555551212" --message-action SUPPRESS
and I'm getting
{
"User": {
"Username": "diego#example.com",
"Enabled": true,
"UserStatus": "FORCE_CHANGE_PASSWORD",
"UserCreateDate": 1566470568.864,
"UserLastModifiedDate": 1566470568.864,
"Attributes": [
{
"Name": "sub",
"Value": "5dac8ce5-2997-4185-b862-86cf15aede77"
},
{
"Name": "phone_number",
"Value": "+15555551212"
}
]
}
}
instead of
{
"User": {
"Username": "7325c1de-b05b-4f84-b321-9adc6e61f4a2",
"Enabled": true,
"UserStatus": "FORCE_CHANGE_PASSWORD",
"UserCreateDate": 1548099495.428,
"UserLastModifiedDate": 1548099495.428,
"Attributes": [
{
"Name": "sub",
"Value": "7325c1de-b05b-4f84-b321-9adc6e61f4a2"
},
{
"Name": "phone_number",
"Value": "+15555551212"
},
{
"Name": "email",
"Value": "diego#example.com"
}
]
}
}
The shorthand notation that you're using, as referenced in the docs here, does indeed seem to be producing the results you are receiving.
A quick way around this issue is to change to using JSON format for the user-attributes option. If you modify the user-attributes option to use JSON, your command will look like this:
aws cognito-idp admin-create-user --user-pool-id us-west-2_aaaaaaaaa --username a567 --user-attributes '[{"Name": "email","Value": "kermit2#somewhere.com"},{"Name": "phone_number","Value": "+15555551212"}]' --message-action SUPPRESS
Which, when executed, produces this output:
{
"User": {
"Username": "a567",
"Enabled": true,
"UserStatus": "FORCE_CHANGE_PASSWORD",
"UserCreateDate": 1566489693.408,
"UserLastModifiedDate": 1566489693.408,
"Attributes": [
{
"Name": "sub",
"Value": "f6ff3e05-5f15-4a53-a45f-52e939b941fd"
},
{
"Name": "phone_number",
"Value": "+15555551212"
},
{
"Name": "email",
"Value": "kermit2#somewhere.com"
}
]
}
}

AWS Lifecycle configuration Noncurrentversion transition: Parameter validation failed

I am using aws cli to setup lifecycle management on a S3 bucket. I am using this json script:
{
"Rules": [
{
"Filter": {
},
"Status": "Enabled",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 365,
"StorageClass": "GLACIER"
}
],
"NoncurrentVersionTransitions": {
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
},
"Expiration": {
"Days": 3650
},
"ID": "Test"
}
]
}
and I am getting this error:
Parameter validation failed:
Invalid type for parameter
LifecycleConfiguration.Rules[0].NoncurrentVersionTransitions, value:
OrderedDict([(u'NoncurrentDays', 30), (u'StorageClass', u'STANDARD_IA')]),
type: <class 'collections.OrderedDict'>, valid types: <type 'list'>, <type
'tuple'>
The script works fine when I exclude the part about the NoncurrentVersionTransitions. I was wondering how I could include the NoncurrentVersionTransitions correctly.
Thanks in advance.
"NoncurrentVersionTransitions": {
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
},
replace with added []
"NoncurrentVersionTransitions": [{
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
}],
solves the issue.

How to configure Batch job queue as targets for AWS::Events::Rule in AWS cloudformation template

My AWS cloudformation template has this:
"ScheduledRule": {
"Type": "AWS::Events::Rule",
"Properties": {
"Description": "ScheduledRule",
"ScheduleExpression": "cron(0/5 * * * ? *)",
"State": "ENABLED",
"Targets": [{
"Here I want to set batch job queue"
}]
}
}
I have created necessary entities for AWS Batch in the template.
"JobDefinition": {
"Type": "AWS::Batch::JobDefinition",
"Properties": {
"Type": "container",
"ContainerProperties": {
"Image": {
"Ref": "ImageUrl"
},
"Vcpus": 2,
"Memory": 2000,
"Command": ["node", "server.js"]
},
"RetryStrategy": {
"Attempts": 1
}
}
},
"JobQueue": {
"Type": "AWS::Batch::JobQueue",
"Properties": {
"Priority": 1,
"ComputeEnvironmentOrder": [
{
"order": 1,
"ComputeEnvironment": { "Ref": "ComputeEnvironment" }
}
]
}
},
"ComputeEnvironment": {
"Type": "AWS::Batch::ComputeEnvironment",
"Properties": {
"Type": "MANAGED",
"ComputeResourses": {
"Type": "EC2",
"MinvCpus": 2,
"DesiredvCpus": 4,
"MaxvCpus": 64,
"InstanceTypes": [
"optimal"
],
"Subnets" : [{ "Ref" : "Subnet" }],
"SecurityGroupIds" : [{ "Ref" : "SecurityGroup" }],
"InstanceRole" : { "Ref" : "IamInstanceProfile" }
},
"ServiceRole" : { "Ref" : "BatchServiceRole" }
}
}
I came to know that it is possible to submit batch job through aws cloudwatch event. AWS cloudwatch event target
I want to use Batch job queue target to submit my job through cloudformation template. I have seen many examples where batch job submission is done through AWS lambda function but I don't want to use lambda function. I didn't find any cloudformation template where Batch job queue target is configured in "AWS::Events::Rule"
Hey I was trying to find a sample myself but didn't find any. With some testing I figured it out. Thought I'd share it here.
Targets:
- Arn:
Ref: BatchProcessingJobQueue
Id: {your_id}
RoleArn: {your_role_arn}
BatchParameters:
JobDefinition:
Ref: BatchProcessingJobDefinition
JobName: {your_job_name}
Input:
!Sub
- '{"Parameters": {"param1": "--param.name1=${param1}", "param2": "--param.name2=${param2}"}}'
- param1: {param1_value}
param2: {param2_value}