DMS replication to Kinesis omit certain fields - amazon-web-services

We have a use case where we have enabled a AWS DMS replication task which streams changes to our Aurora Postgres cluster to a Kinesis Data stream. The replication task is working as expected but the data that its sending to Kinesis Data Stream as json contains fields like metadata that we don't care about and would ideally like to omit them. Is there a way to do this without triggering a Lambda on KDS to remove the unwanted fields from the json?
I was looking at using table mappings config of the DMS task when KDS is the target, documentation here - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html. The docs don't mention anything of this sort. Maybe I am missing something.
The current table mapping for my usecase is as follows -
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"rule-action": "include",
"object-locator": {
"schema-name": "public",
"table-name": "%"
}
},
{
"rule-type": "object-mapping",
"rule-id": "2",
"rule-name": "DefaultMapToKinesis",
"rule-action": "map-record-to-record",
"object-locator": {
"schema-name": "public",
"table-name": "testing"
}
}
]
}
The table testing only has two columns namely id and value of type varchar and decimal respectively.
The result I am getting in KDS is as follows -
{
"data": {
"id": "5",
"value": 1111.22
},
"metadata": {
"timestamp": "2022-08-23T09:32:34.222745Z",
"record-type": "data",
"operation": "insert",
"partition-key-type": "schema-table",
"schema-name": "public",
"table-name": "testing",
"transaction-id": 145524
}
}
As seen above we are only interested in the data key of the json.
Is there any way in DMS config or KDS to filter on the data portion of the json sent by DMS without involving any new infra like Lambda?

Related

AWS DMS for MySQL Aurora

I am trying to replicate MySQL Aurora database to another MySQL Aurora. Always its creating the database with the same name as source. Is there any way to specify the target DB name? Mean, I want to replicate "x" table to A database to "x" table of B database.
A.x => B.x
You can specify a table mapping rule for your DMS replication task as follows:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "public",
"table-name": "%"
},
"rule-action": "include"
},
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "2",
"rule-action": "rename",
"rule-target": "table",
"object-locator": {
"schema-name": "public",
"table-name": "old-table"
},
"value": "new-table"
}
]
}
This will copy all tables from public schema and rename just the one you specify.
Detailed documentation is here: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html

AWS DMS CDC - Only capture changed values not entire record? (Source RDS MySQL)

I have a DMS CDC task set (change data capture) from a MySQL database to stream to a Kinesis stream which a Lambda is connected to.
I was hoping to ultimately receive only the value that has changed and not on entire dump of the row, this way I know what column is being changed (at the moment it's impossible to decipher this without setting up another system to track changes myself).
Example, with the following mapping rule:
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "my-schema",
"table-name": "product"
},
"rule-action": "include",
"filters": []
},
and if I changed the name property of a record on the product table, I would hope to recieve a record like this:
{
"data": {
"name": "newValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}
However what I actually recieve is something like this:
{
"data": {
"name": "newValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}
As you can see when receiving this, it's impossible to decipher what property has changed without setting up additional systems to track this.
I've found another stackoverflow thread where someone is posting an issue because their CDC is doing what I want mine to do. Can anyone point me into the right direction to achieve this?
I found the answer after digging into AWS documentation some more.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html#CHAP_Target.Kinesis.BeforeImage
Different source database engines provide different amounts of
information for a before image:
Oracle provides updates to columns only if they change.
PostgreSQL provides only data for columns that are part of the primary
key (changed or not).
MySQL generally provides data for all columns (changed or not).
I used the BeforeImageSettings on the task setting to include the original data with payloads.
"BeforeImageSettings": {
"EnableBeforeImage": true,
"FieldName": "before-image",
"ColumnFilter": "all"
}
While this still gives me the whole record, it give me enough data to work out what's changed without additional systems.
{
"data": {
"name": "newValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"before-image": {
"name": "oldValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}

Does load-order and parallel-load work in combination in AWS DMS?

I have two tables let's call it tbl_parent and tbl_child in Microsoft SQL Server, and trying to transfer the data of these tables to AWS aurora PostgreSQL.
Here tbl_child has a foreign key reference on tbl_parent, so to avoid FK issues during the full load I have mentioned higher load-order for tbl_parent than table tbl_child. The same was suggested in aws documentation.
So with that load-order setting, I was expecting that the data would only be inserted into tbl_child after the complete insertion is done in tbl_parent .
Also I want to do a parallel load into child table, so I have specified table-settings rule to perform parallel load as suggested in aws documentation
For some reason I see the data into tbl_child is getting inserted even before tbl_parent load is complete, and I'm getting Foreign key issues like below ...
2021-07-14T18:53:43 [TARGET_LOAD ]W: Load command output: psql:
/usr/lib64/libcom_err.so.2: no version information available (required
by /rdsdbbin/awsdms/lib/libgssapi_krb5.so.2) psql:
/usr/lib64/libcom_err.so.2: no version information available (required
by /rdsdbbin/awsdms/lib/libkrb5.so.3) ERROR: insert or update on
table "tbl_child" violates foreign key constraint
"tbl_child_tbl_parent_fkey" DETAIL: Key (parent_id)=(1468137) is not
present in table "tbl_parent". (csv_target.c:1018)
In case if it helps please find the mapping rules json below
{
"rules": [
{
"rule-type": "selection",
"rule-id": "101",
"rule-name": "101",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_parent"
},
"rule-action": "include",
"filters": [],
"load-order": 2
},
{
"rule-type": "selection",
"rule-id": "102",
"rule-name": "102",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_child"
},
"rule-action": "include",
"filters": [],
"load-order": 1
},
{
"rule-type": "table-settings",
"rule-id": "131",
"rule-name": "Parallel_Range_Child",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_child"
},
"parallel-load": {
"type": "ranges",
"columns": [
"child_id"
],
"boundaries": [
[
"100"
],
[
"200"
],
[
"300"
]
]
}
}
]
}
If you want to strictly control load order, for example, finished one table then another table, you must set maxFullloadsubtasks=1.
Actually, to avoid such issues one way is to disable FK/RI checking in target db, such options are a/v in SQL Server/MySQL/Oracle, but seems not a/v in PgSQL. Anyway your steps for full + CDC using DMS maybe:
Create the basic table schema including primary key in target db
DMS full load
Create FK/RIs/index and other db objects
DMS CDC

How do you insert values into dynamodb through cloudformation?

I'm creating a table in cloudformation:
"MyStuffTable": {
"Type": "AWS::DynamoDB::Table",
"Properties": {
"TableName": "MyStuff"
"AttributeDefinitions": [{
"AttributeName": "identifier",
"AttributeType": "S"
]},
"KeySchema": [{
"AttributeName": "identifier",
"KeyType": "HASH",
}],
"ProvisionedThroughput": {
"ReadCapacityUnits": "5",
"WriteCapacityUnits": "1"
}
}
}
Then later on in the cloudformation, I want to insert records into that table, something like this:
identifier: Stuff1
data: {My list of stuff here}
And insert that into values in the code below. I had seen somewhere an example that used Custom::Install, but I can't find it now, or any documentation on it.
So this is what I have:
MyStuff: {
"Type": "Custom::Install",
"DependsOn": [
"MyStuffTable"
],
"Properties": {
"ServiceToken": {
"Fn::GetAtt": ["MyStuffTable","Arn"]
},
"Action": "fields",
"Values": [{<insert records into this array}]
}
};
When I run that, I'm getting this Invalid service token.
So I'm not doing something right trying to reference the table to insert the records into. I can't seem to find any documentation on Custom::Install, so I don't know for sure that it's the right way to go about inserting records through cloudformation. I also can't seem to find documentation on inserting records through cloudformation. I know it can be done. I'm probably missing something very simple. Any ideas?
Custom::Install is a Custom Resource in CloudFormation.
This is a special type of resource which you have to develop yourself. This is mostly done by means of Lambda Function (can also be SNS).
So to answer your question. To add data to your table, you would have to write your own custom resource in lambda. The lambda would put records into the table.
Action and fields are custom parameters which CloudFormation passes to the lambda in the example of Custom::Install. The parameters can be anything you want, as you are designing the custom resource tailored to your requirements.

What does 'schema' refer to in AWS Database Migration Service (DMS) if source database is S3?

I'm trying to transfer a test csv file from S3 to DynamoDB table using AWS Database Migration Service. I'm new to aws so forgive me if I'm doing something completely wrong.
I've created and tested source & target endpoints with no problem. However, I've run into some task definition errors (I'm not sure why but my logs don't appear in CloudWatch).
For simplicity's sake, my test source S3 file has only one column: eventId. The path is as follows: s3://myBucket/testFolder/events/events.csv
This is a JSON external definition file:
{
"TableCount": "1",
"Tables": [
{
"TableName": "events",
"TablePath": "testFolder/events/",
"TableOwner": "testFolder",
"TableColumns": [
{
"ColumnName": "eventId",
"ColumnType": "STRING",
"ColumnNullable": "false",
"ColumnIsPk": "true",
"ColumnLength": "10",
}
],
"TableColumnsTotal": "1"
}
]
}
Here's my task definition:
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "testFolder",
"table-name": "events"
},
"rule-action": "include"
},
{
"rule-type": "object-mapping",
"rule-id": "2",
"rule-name": "2",
"rule-action": "map-record-to-record",
"object-locator": {
"schema-name": "testFolder",
"table-name": "tableName"
},
"target-table-name": "myTestDynamoDBTable",
"mapping-parameters": {
"partition-key-name": "eventId",
"attribute-mappings": [
{
"target-attribute-name": "eventId",
"attribute-type": "scalar",
"attribute-sub-type": "string",
"value": "${eventId}"
}
]
}
}
]
}
Every time, my task is errored. I'm particularly confused about schema as my source file is in S3 so I thought schema is not needed there? I found this line in AWS docs:
s3://mybucket/hr/employee. At load time, AWS DMS assumes that the source schema name is hr... -> So should I include some sort of schema file in the hr folder?
Apologies if this is wrong, I'd appreciate any advice. Thanks.