I am trying to use AWS EventBridge Input Transformer to get the tags of an EC2 instance but I am not familiar with this stuff. the event JSON would look like this (Trimming out irrelevant info):
"tags": [{
"key": "Name",
"value": "windows-server-1"
},
{
"key": "Patch Group",
"value": "Windows"
}],
I am able to access different tags using numeric keys like so:
"patchGroup":"$.detail.resource.instanceDetails.tags[1].value"
The issue is the numeric key isnt standard on our instances and I will always need the Patch Group tag. If I were using JS or C# there would be logic I could implement to find this, I am not seeing anything like that in the documentation. Is there a way to easily get a tag with a key I am missing?
Question
What is the S3 extended destination configuration and where in the AWS documentation explains clearly what it is for?
As the name suggests, it must be about S3 destination. However, the S3 destination part of the AWS document has no mention.
Choose Amazon S3 for Your Destination
If there are articles or blogs which have clear explanation, please provide the pointers.
I have been looking for a clue in the documentations as below, but as often with the AWS documentations, it is not clear. It looks partly related with input record conversion or record processing.
Amazon Kinesis Data Firehose API Reference - ExtendedS3DestinationConfiguration
Describes the configuration of a destination in Amazon S3.
Amazon Kinesis Data Firehose Developer Guide PDF - Converting Input Record Format (API)
If you want Kinesis Data Firehose to convert the format of your input data from JSON
to Parquet or ORC, specify the optional DataFormatConversionConfiguration element in
ExtendedS3DestinationConfiguration ...
AWS CloudFormation - AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration
The ExtendedS3DestinationConfiguration property type configures an Amazon S3 destination for an Amazon Kinesis Data Firehose delivery stream.
Extended S3 Destination
resource "aws_kinesis_firehose_delivery_stream" "extended_s3_stream" {
name = "terraform-kinesis-firehose-extended-s3-test-stream"
destination = "extended_s3"
extended_s3_configuration {
role_arn = "${aws_iam_role.firehose_role.arn}"
bucket_arn = "${aws_s3_bucket.bucket.arn}"
processing_configuration {
enabled = "true"
processors {
type = "Lambda"
parameters {
parameter_name = "LambdaArn"
parameter_value = "${aws_lambda_function.lambda_processor.arn}:$LATEST"
}
}
}
}
}
The Terraform documentation is the best at showing the difference between S3 and Extended S3 destinations: https://www.terraform.io/docs/providers/aws/r/kinesis_firehose_delivery_stream.html
S3 Extended inherits the S3 destination configuration parameters with extra ones such as data_format_conversion_configuration or the error_output_prefix
I am afraid the Kinesis Firehose document is so poorly written, I wonder how people can figure out how to use Firehose just from the documentation.
It looks originally the firehose simply relays data to the S3 bucket and there is no built-in transformation mechanism and the S3 destination configuration has no processing configuration as in AWS::KinesisFirehose::DeliveryStream S3DestinationConfiguration.
Then as in Amazon Kinesis Firehose Data Transformation with AWS Lambda, a mechanism to transform records was introduced seemingly around early 2017, so AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration has been added.
Apparently people struggle to find the way of how to configure:
Does Amazon Kinesis Firehose support Data Transformations programmatically?
Well so I figured it out after much effort and documentation scrounging.
Who can figure it out by just reading the AWS document?
Amazon Kinesis Data Firehose Data Transformation
Firehose extended S3 configurations for lambda transformation
Could not figure out from the AWS document, but it looks the configurations required are below after looking into the actual implementations in the Internet.
Update
As per the suggestion by Kevin Eid.
Resource: aws_kinesis_firehose_delivery_stream
s3_configuration - (Optional) Required for non-S3 destinations. For S3 destination, use extended_s3_configuration instead.
The extended_s3_configuration object supports the same fields from s3_configuration as well as the following:
data_format_conversion_configuration - (Optional) Nested argument for the serializer, deserializer, and schema for converting data from the JSON format to the Parquet or ORC format before writing it to Amazon S3. More details given below.
error_output_prefix - (Optional) Prefix added to failed records before writing them to S3. This prefix appears immediately following the bucket name.
processing_configuration - (Optional) The data processing configuration. More details are given below.
s3_backup_mode - (Optional) The Amazon S3 backup mode. Valid values are Disabled and Enabled. Default value is Disabled.
s3_backup_configuration - (Optional) The configuration for backup in Amazon S3. Required if s3_backup_mode is Enabled. Supports the same fields as s3_configuration object.
The s3_configuration is still there due to compatibility or legacy reason only I believe, hence only need to use extended_s3_configuration but the AWS documentation does not explain properly. It is such a pity the AWS documentation does not serve as the source of truth.
First of The ExtendedS3DestinationConfiguration property type configures an Amazon S3 destination for an Amazon Kinesis Data Firehose delivery stream.
See:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-extendeds3destinationconfiguration.html
Thanks.
This little screenshot shows new components in ExtendedS3DestinationConfiguration as compared to S3DestinationConfiguration:
Also, what is and how the extended s3 configuration is defined are shown in API documentation:
{
"RoleARN": "string",
"BucketARN": "string",
"Prefix": "string",
"ErrorOutputPrefix": "string",
"BufferingHints": {
"SizeInMBs": integer,
"IntervalInSeconds": integer
},
"CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption",
"KMSEncryptionConfig": {
"AWSKMSKeyARN": "string"
}
},
"CloudWatchLoggingOptions": {
"Enabled": true|false,
"LogGroupName": "string",
"LogStreamName": "string"
},
"ProcessingConfiguration": {
"Enabled": true|false,
"Processors": [
{
"Type": "Lambda",
"Parameters": [
{
"ParameterName": "LambdaArn"|"NumberOfRetries"|"RoleArn"|"BufferSizeInMBs"|"BufferIntervalInSeconds",
"ParameterValue": "string"
}
...
]
}
...
]
},
"S3BackupMode": "Disabled"|"Enabled",
"S3BackupUpdate": {
"RoleARN": "string",
"BucketARN": "string",
"Prefix": "string",
"ErrorOutputPrefix": "string",
"BufferingHints": {
"SizeInMBs": integer,
"IntervalInSeconds": integer
},
"CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
"EncryptionConfiguration": {
"NoEncryptionConfig": "NoEncryption",
"KMSEncryptionConfig": {
"AWSKMSKeyARN": "string"
}
},
"CloudWatchLoggingOptions": {
"Enabled": true|false,
"LogGroupName": "string",
"LogStreamName": "string"
}
},
"DataFormatConversionConfiguration": {
"SchemaConfiguration": {
"RoleARN": "string",
"CatalogId": "string",
"DatabaseName": "string",
"TableName": "string",
"Region": "string",
"VersionId": "string"
},
"InputFormatConfiguration": {
"Deserializer": {
"OpenXJsonSerDe": {
"ConvertDotsInJsonKeysToUnderscores": true|false,
"CaseInsensitive": true|false,
"ColumnToJsonKeyMappings": {"string": "string"
...}
},
"HiveJsonSerDe": {
"TimestampFormats": ["string", ...]
}
}
},
"OutputFormatConfiguration": {
"Serializer": {
"ParquetSerDe": {
"BlockSizeBytes": integer,
"PageSizeBytes": integer,
"Compression": "UNCOMPRESSED"|"GZIP"|"SNAPPY",
"EnableDictionaryCompression": true|false,
"MaxPaddingBytes": integer,
"WriterVersion": "V1"|"V2"
},
"OrcSerDe": {
"StripeSizeBytes": integer,
"BlockSizeBytes": integer,
"RowIndexStride": integer,
"EnablePadding": true|false,
"PaddingTolerance": double,
"Compression": "NONE"|"ZLIB"|"SNAPPY",
"BloomFilterColumns": ["string", ...],
"BloomFilterFalsePositiveProbability": double,
"DictionaryKeyThreshold": double,
"FormatVersion": "V0_11"|"V0_12"
}
}
},
"Enabled": true|false
}
}
I need to fetch the object metadata based on the last modified time from Java SDK
I can fetch the data based on last modified time from AWS CLI but I need the same thing from java SDK
I used the following command to get the data
s3api list-objects --bucket aws-codestar-us-east-2-148844964152052 --query "Contents[?LastModified>='2020-02-28T09:34:50+00:00'].{Key: Key, Size: Size,LastModified:LastModified}"
response is:
[
{
"Key": "abc.png",
"Size": 361211,
"LastModified": "2020-03-04T12:11:14+00:00"
},
{
"Key": "btest.png",
"Size": 513624,
"LastModified": "2020-02-28T09:34:50+00:00"
}
]
I am expecting the same response from AWS s3 SDK
Thanks for your time.
The command I use:
aws s3api put-bucket-notification-configuration --bucket bucket-name --notification-configuration file:///Users/chris/event_config.json
Works fine if I take out the "Filter" key. As soon as I add it in, I get:
Parameter validation failed:
Unknown parameter in NotificationConfiguration.LambdaFunctionConfigurations[0]: "Filter", must be one of: Id, LambdaFunctionArn, Events
Here's my JSON file:
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:000000000:function:name",
"Events": [
"s3:ObjectCreated:*"
],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "images/"
}
]
}
}
}
]
}
When I look at the command's docs (http://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification-configuration.html), I don't see any mistake. I've tried copy/pasting, carefully looking over, etc... Any help would be greatly appreciated!
You need to be running at least version 1.7.46 of aws-cli, released 2015-08-20.
This release adds Amazon S3 support for event notification filters and fixes some issues.
https://aws.amazon.com/releasenotes/CLI/3585202016507998
The aws-cli utility contains a lot of built-in intelligence and validation logic. New features often require the code in aws-cli to be updated, and Filter on S3 event notifications is a relatively recent feature.
See also: https://aws.amazon.com/blogs/aws/amazon-s3-update-delete-notifications-better-filters-bucket-metrics/
I successfully managed to get a data pipeline to transfer data from a set of tables in Amazon RDS (Aurora) to a set of .csv files in S3 with a "copyActivity" connecting the two DataNodes.
However, I'd like the .csv file to have the name of the table (or view) that it came from. I can't quite figure out how to do this. I think the best approach is to use an expression the filePath parameter of the S3 DataNode.
But, I've tried #{table}, #{node.table}, #{parent.table}, and a variety of combinations of node.id and parent.name without success.
Here's a couple of JSON snippets from my pipeline:
"database": {
"ref": "DatabaseId_abc123"
},
"name": "Foo",
"id": "DataNodeId_xyz321",
"type": "MySqlDataNode",
"table": "table_foo",
"selectQuery": "select * from #{table}"
},
{
"schedule": {
"ref": "DefaultSchedule"
},
"filePath": "#{myOutputS3Loc}/#{parent.node.table.help.me.here}.csv",
"name": "S3_BAR_Bucket",
"id": "DataNodeId_w7x8y9",
"type": "S3DataNode"
}
Any advice you can provide would be appreciated.
I see that you have #{table} (did you mean #{myTable}?). If you are using a parameter to pass the name of the DB table, you can use that in the S3 filepath as well like this:
"filePath": "#{myOutputS3Loc}/#{myTable}.csv",