Proper usage of aws create-data-set [aws cli] - amazon-web-services

I have setup a data-source with the following:
aws quicksight create-data-source --cli-input-json file://connection.json
cat connection.json:
{
"AwsAccountId": "44455...",
"DataSourceId": "abcdefg13asdafsad",
"Name": "randomname",
"Type": "S3",
"DataSourceParameters": {
"S3Parameters": {
"ManifestFileLocation": {
"Bucket": "cmunetcoms20",
"Key": "asn-manifest.json"
}
}
}
}
asn-manifest.json contains (and is placed in the appropriate bucket):
{
"fileLocations": [
{
"URIs": [
"https://cmunetcoms20.s3.us-east-2.amazonaws.com/ASN_Scores.csv"
]
},
{
"URIPrefixes": [
"prefix1",
"prefix2",
"prefix3"
]
}
],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ",",
"textqualifier": "'",
"containsHeader": "true"
}
}
This successfully creates a data-source and then when I go to create a data-set I use
aws quicksight create-data-set --cli-input-json file://skeleton
skeleton contains:
{
"AwsAccountId": "44455...",
"DataSetId": "generatedDataSetName",
"Name": "test-asn-demo",
"PhysicalTableMap": {
"ASNs": {
"S3Source": {
"DataSourceArn": "arn:aws:quicksight:us-east-2:444558491062:datasource/cmunetcoms20162031",
"InputColumns": [
{
"Name": "ASN",
"Type": "INTEGER"
},
{
"Name": "Score",
"Type": "DECIMAL"
},
{
"Name": "Total_IPs",
"Type": "INTEGER"
},
{
"Name": "Badness",
"Type": "DECIMAL"
}
]
}
}
},
"ImportMode": "SPICE"
}
Throws the following error:
"An error occurred (InvalidParameterValueException) when calling the
CreateDataSet operation: Input column ASN in physical table ASNs has
invalid type. Allowed types for S3 physical table are [String]"
If I change each Type to "String", it throws the following error:
An error occurred (LimitExceededException) when calling the
CreateDataSet operation: Insufficient SPICE capacity
There is plenty of SPICE on the account, something like 51 GB, and almost 0 utilization. Additionally, I ran the numbers and the total amount of Spice that I think should be used for this data set is approximately 0 GB. (size 71k rows, 4 columns, each column as a string to pad my calculation).
Thanks

Got it fam. The solution for me was a regional configuration problem. My s3 bucket was in us-east-2 and my quicksight was in us-east-1. Trying to create a data set in a region that is not ur primary account (even though you have enterprise), causes a spice error since alternate regions are not given any spice balance to start out.

Related

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

I am running a state machine running a dynamodb query (called using CallAwsService). The format returned looks like this:
{
Items: [
{
"string" : {
"B": blob,
"BOOL": boolean,
"BS": [ blob ],
"L": [
"AttributeValue"
],
"M": {
"string" : "AttributeValue"
},
"N": "string",
"NS": [ "string" ],
"NULL": boolean,
"S": "string",
"SS": [ "string" ]
}
}
]
}
I would like to unmarshall this data efficiently and would like to avoid using a lambda call for this
The CDK code we're currently using for the query is below
interface FindItemsStepFunctionProps {
table: Table
id: string
}
export const FindItemsStepFunction = (scope: Construct, props: FindItemStepFunctionProps): StateMachine => {
const { table, id } = props
const definition = new CallAwsService(scope, 'Query', {
service: 'dynamoDb',
action: 'query',
parameters: {
TableName: table.tableName,
IndexName: 'exampleIndexName',
KeyConditionExpression: 'id = :id',
ExpressionAttributeValues: {
':id': {
'S.$': '$.path.id',
},
},
},
iamResources: ['*'],
})
return new StateMachine(scope, id, {
logs: {
destination: new LogGroup(scope, `${id}LogGroup`, {
logGroupName: `${id}LogGroup`,
removalPolicy: RemovalPolicy.DESTROY,
retention: RetentionDays.ONE_WEEK,
}),
level: LogLevel.ALL,
},
definition,
stateMachineType: StateMachineType.EXPRESS,
stateMachineName: id,
timeout: Duration.minutes(5),
})
}
Can you unmarshall the data downstream? I'm not too well versed on StepFunctions, do you have the ability to import utilities?
Unmarshalling DDB JSON is as simple as calling the unmarshall function from DynamoDB utility:
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_util_dynamodb.html
You may need to do so downstream as StepFunctions seems to implement the low level client.
Step functions still don't make it easy enough to call DynamoDB directly from a step in a state machine without using a Lambda function. The main missing parts are the handling of the different cases of finding zero, one or more records in a query, and the unmarshaling of the slightly complicated format of DynamoDB records. Sadly the $utils library is still not supported in step functions.
You will need to implement these two in specific steps in the graph.
Here is a diagram of the steps that we use as DynamoDB query template:
The first step is used to provide parameters to the query. This step can be omitted and define the parameters in the query step:
"Set Query Parameters": {
"Type": "Pass",
"Next": "DynamoDB Query ...",
"Result": {
"tableName": "<TABLE_NAME>",
"key_value": "<QUERY_KEY>",
"attribute_value": "<ATTRIBUTE_VALUE>"
}
}
The next step is the actual query to DynamoDB. You can also use GetItem instead of Query if you have the record keys.
"Type": "Task",
"Parameters": {
"TableName": "$.tableName",
"IndexName": "<INDEX_NAME_IF_NEEDED>",
"KeyConditionExpression": "#n1 = :v1",
"FilterExpression": "#n2.#n3 = :v2",
"ExpressionAttributeNames": {
"#n1": "<KEY_NAME>",
"#n2": "<ATTRIBUTE_NAME>",
"#n3": "<NESTED_ATTRIBUTE_NAME>"
},
"ExpressionAttributeValues": {
":v1": {
"S.$": "$.key_value"
},
":v2": {
"S.$": "$.attribute_value"
}
},
"ScanIndexForward": false
},
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"ResultPath": "$.ddb_record",
"ResultSelector": {
"result.$": "$.Items[0]"
},
"Next": "Check for DDB Object"
}
The above example seems a bit complicated, using both ExpressionAttributeNames and ExpressionAttributeValues. However, it makes it possible to query on nested attributes such as item.id.
In this example, we only take the first item response with $.Items[0]. However, you can take all the results if you need more than one.
The next step is to check if the query returned a record or not.
"Check for DDB Object": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.ddb_record.result",
"IsNull": false,
"Comment": "Found Context Object",
"Next": "Parse DDB Object"
}
],
"Default": "Do Nothing"
}
And lastly, to answer your original question, we can parse the query result, in case that we have one:
"Parse DDB Object": {
"Type": "Pass",
"Parameters": {
"string_object.$": "$.ddb_record.result.string_object.S",
"bool_object.$": "$.ddb_record.result.bool_object.Bool",
"dict_object": {
"nested_dict_object.$": "$.ddb_record.result.item.M.name.S",
},
"dict_object_full.$": "States.StringToJson($.ddb_record.result.JSON_object.S)"
},
"ResultPath": "$.parsed_ddb_record",
"End": true
}
Please note that:
Simple strings are easily converted by "string_object.$": "$.ddb_record.result.string_object.S"
The same for numbers or booleans by "bool_object.$": "$.ddb_record.result.bool_object.Bool")
Nested objects are parsing the map object ("item.name.$": "$.ddb_record.result.item.M.name.S", for example)
Creation of a JSON object can be achieved by using States.StringToJson
The parsed object is added as a new entry on the flow using "ResultPath": "$.parsed_ddb_record"

AWS LexV2 CDK/CloudFormation error when using Image Response Cards

I am deploying a Lex V2 bot with AWS CDK and want my bot to have buttons for eliciting slots, but for some reason I get an error:
DevBot Resource handler returned message:
"Importing CDK-DevBot failed due to [There was an error importing the bot.
Make sure that the imported bot and contents of the zip file are correct, then try your request again.].
The import could not be completed."
(RequestToken: ebd3354f-6169-922a-d0f9-d14690671e25, HandlerErrorCode: InvalidRequest)
This error is not very informative. The relevant part of the CloudFormation template: "Message"
"MessageGroupsList: [{
"Message": {
"ImageResponseCard": {
"Buttons": [
{
"Text": "1.0.3",
"Value": "1.0.3"
},
{
"Text": "1.0.5",
"Value": "1.0.5"
}
],
"Title": "Title"
},
"PlainTextMessage": {
"Value": "Please enter the issue number"
}
}
}]
If I remove "ImageResponseCard" then it deploys okay. Otherwise, I get the error.
Has anybody else had this problem and found a way to overcome it?
The MessageGroupList is an array of Message elements. Every element must have a different type of Message that could be ImageResponseCard or PlainTextMessage. So in your case the template has an incorrect structure, it should be something like that:
{
"MessageGroupsList": [
{
"Message": {
"ImageResponseCard": {
"Buttons": [
{
"Text": "1.0.3",
"Value": "1.0.3"
},
{
"Text": "1.0.5",
"Value": "1.0.5"
}
],
"Title": "Title"
}
}
},
{
"Message": {
"PlainTextMessage": {
"Value": "Please enter the issue number"
}
}
}
]
}
Assumming that the missing tick in MessageGroupList is a typo.

How to get the number of rows inserted using BigQuery Streaming

I am reading data from a CSV file, inserting the data to a Big Query table using insertAll() method from Streaming Insert as shown below:
InsertAllResponse response = dfsf.insertAll(InsertAllRequest.newBuilder(tableId).setRows(rows).build());
rows here is an Iterable declared like this:
Iterable<InsertAllRequest.RowToInsert> rows
Now, I am actually batching the rows to insert into a size of 500 as suggested here - link to suggestion
After all the data has been inserted, how do I count the total number of rows that were inserted?
I want to find that out and log it to log4j.
This can be done one of two ways
The BigQuery Jobs API via the getQueryResults
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults
Cloud Logging, the output you want in the tableDataChange field.
Here is a sample output:
{
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {},
"authenticationInfo": {
"principalEmail": "service_account"
},
"requestMetadata": {
"callerIp": "2600:1900:2000:1b:400::27",
"callerSuppliedUserAgent": "gl-python/3.7.1 grpc/1.22.0 gax/1.14.2 gapic/1.12.1 gccl/1.12.1,gzip(gfe)"
},
"serviceName": "bigquery.googleapis.com",
"methodName": "google.cloud.bigquery.v2.JobService.InsertJob",
"authorizationInfo": [
{
"resource": "projects/project_id/datasets/dataset/tables/table",
"permission": "bigquery.tables.updateData",
"granted": true
}
],
"resourceName": "projects/project_id/datasets/dataset/tables/table",
"metadata": {
"tableDataChange": {
"deletedRowsCount": "2",
"insertedRowsCount": "2",
"reason": "QUERY",
"jobName": "projects/PRJOECT_ID/jobs/85f19bdd-aff5-4abe-9283-9f0bc9ed3ce8"
},
"#type": "type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"
}
},
"insertId": "7x7ye390qm",
"resource": {
"type": "bigquery_dataset",
"labels": {
"project_id": "PRJOECT_ID",
"dataset_id": "dataset-id"
}
},
"timestamp": "2020-10-26T07:00:22.960735Z",
"severity": "INFO",
"logName": "projects/PRJOECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access",
"receiveTimestamp": "2020-10-26T07:00:23.763159336Z"
}

AWS ECS cluster Capacity Provider

I'm using this cloudformation template to create capacity providers for ECS cluster with the autoscaling group specified in the ecs capacity provider:
"ECSCapacityProvider": {
"Type": "AWS::ECS::CapacityProvider",
"Properties": {
"AutoScalingGroupProvider": {
"AutoScalingGroupArn": { "Ref" : "AutoScalingGroup" }
}
},
"DependsOn" : "AutoScalingGroup"
},
"DRCluster": {
"Type": "AWS::ECS::Cluster",
"Properties": {
"ClusterName": { "Ref" : "WindowsECSCluster" },
"CapacityProviders" : "ECSCapacityProvider",
"Tags": [
{
"Key": "environment",
"Value": "dr"
}
]
},
"DependsOn" : "ECSCapacityProvider"
}
But while creating the stack it resulted in the following error:
Model validation failed (#/CapacityProviders: expected type: JSONArray, found: String)
I could not find proper documentation for the capacity providers. I'm using it to attach the Auto Scaling group to the cluster, which i hope is the correct way to do so. I'm new to cloudformation, any help is much appreciated.
CapacityProviders is a List of String, not a String like you have now:
"CapacityProviders" : "ECSCapacityProvider",
Therefore, in you DRCluster you can use the following instead:
"CapacityProviders" : [ {"Ref": "ECSCapacityProvider"} ],

AWS Route53 CLI list-resource-record-sets by Value

I need to locate a record in Route53 based on Value. My Route53 has 10,000+ records. Searching by Value for a Hosted Zone with more than 2000 records is not currently supported in the web interface. So, I must resort to using the AWS Route53 CLI's list-resource-record-sets command and the --query parameter. This parameter uses JMESPath to select or filter the result set.
So, let's look at the result set we are working with.
$ aws route53 list-resource-record-sets --hosted-zone-id Z3RB47PQXVL6N2 --max-items 5 --profile myprofile
{
"NextToken": "eyJTdGFydFJlY29yZE5hbWUiOiBudWxsLCAiU3RhcnRSZWNvcmRJZGVudGlmaWVyIjogbnVsbCwgIlN0YXJ0UmVjb3JkVHlwZSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9",
"ResourceRecordSets": [
{
"ResourceRecords": [
{
"Value": "ns-1264.awsdns-30.org."
},
{
"Value": "ns-698.awsdns-23.net."
},
{
"Value": "ns-1798.awsdns-32.co.uk."
},
{
"Value": "ns-421.awsdns-52.com."
}
],
"Type": "NS",
"Name": "mydomain.com.",
"TTL": 300
},
{
"ResourceRecords": [
{
"Value": "ns-1264.awsdns-30.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400"
}
],
"Type": "SOA",
"Name": "mydomain.com.",
"TTL": 300
},
{
"ResourceRecords": [
{
"Value": "12.23.34.45"
}
],
"Type": "A",
"Name": "abcdefg.mydomain.com.",
"TTL": 300
},
{
"ResourceRecords": [
{
"Value": "34.45.56.67"
}
],
"Type": "A",
"Name": "zyxwvut.mydomain.com.",
"TTL": 300
},
{
"ResourceRecords": [
{
"Value": "45.56.67.78"
}
],
"Type": "A",
"Name": "abcdxyz.mydomain.com.",
"TTL": 300
}
]
}
Ideally I need to find the ResourceRecordSets.Name, but I can definitely work with returning the entire ResourceRecordSet object, of any record that has a ResourceRecords.Value == 45.56.67.78.
My failed attempts
// My first attempt was to use filters on two levels, but this always returns an empty array
ResourceRecordSets[?Type == 'A'].ResourceRecords[?Value == '45.56.67.78'][]
[]
// Second attempt came after doing more research on JMESPath. I could not find any good examples using filters on two levels, so I do not filter on ResourceRecordSets
ResourceRecordSets[*].ResourceRecords[?Value == '45.56.67.78']
[
[],
[],
[
{
"Value": "45.56.67.78"
}
],
[],
[]
]
After beating my head on the desk for a while longer I decided to consult the experts. Using the above example, how can I utilize JMESPath and the AWS Route53 CLI to return one of the two following for records with a Value == 45.56.67.78?
[
"Name": "abcdxyz.mydomain.com."
]
OR
{
"ResourceRecords": [
{
"Value": "45.56.67.78"
}
],
"Type": "A",
"Name": "abcdxyz.mydomain.com.",
"TTL": 300
}
This should do:
aws route53 list-resource-record-sets --hosted-zone-id Z3RB47PQXVL6N2 --query "ResourceRecordSets[?ResourceRecords[?Value == '45.56.67.78'] && Type == 'A'].Name"