AWS DMS for MySQL Aurora - amazon-web-services

I am trying to replicate MySQL Aurora database to another MySQL Aurora. Always its creating the database with the same name as source. Is there any way to specify the target DB name? Mean, I want to replicate "x" table to A database to "x" table of B database.
A.x => B.x

You can specify a table mapping rule for your DMS replication task as follows:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "public",
"table-name": "%"
},
"rule-action": "include"
},
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "2",
"rule-action": "rename",
"rule-target": "table",
"object-locator": {
"schema-name": "public",
"table-name": "old-table"
},
"value": "new-table"
}
]
}
This will copy all tables from public schema and rename just the one you specify.
Detailed documentation is here: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html

Related

DMS replication to Kinesis omit certain fields

We have a use case where we have enabled a AWS DMS replication task which streams changes to our Aurora Postgres cluster to a Kinesis Data stream. The replication task is working as expected but the data that its sending to Kinesis Data Stream as json contains fields like metadata that we don't care about and would ideally like to omit them. Is there a way to do this without triggering a Lambda on KDS to remove the unwanted fields from the json?
I was looking at using table mappings config of the DMS task when KDS is the target, documentation here - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html. The docs don't mention anything of this sort. Maybe I am missing something.
The current table mapping for my usecase is as follows -
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"rule-action": "include",
"object-locator": {
"schema-name": "public",
"table-name": "%"
}
},
{
"rule-type": "object-mapping",
"rule-id": "2",
"rule-name": "DefaultMapToKinesis",
"rule-action": "map-record-to-record",
"object-locator": {
"schema-name": "public",
"table-name": "testing"
}
}
]
}
The table testing only has two columns namely id and value of type varchar and decimal respectively.
The result I am getting in KDS is as follows -
{
"data": {
"id": "5",
"value": 1111.22
},
"metadata": {
"timestamp": "2022-08-23T09:32:34.222745Z",
"record-type": "data",
"operation": "insert",
"partition-key-type": "schema-table",
"schema-name": "public",
"table-name": "testing",
"transaction-id": 145524
}
}
As seen above we are only interested in the data key of the json.
Is there any way in DMS config or KDS to filter on the data portion of the json sent by DMS without involving any new infra like Lambda?

How we can use same Tags into two AWS::DynamoDB::Table with in Cloudformation Template

I'm trying to create Amazon DynamoDB tables using Cloud Formation Template. So my question is can I Used same Tags in multiple Tables using Ref.
"AWSTemplateFormatVersion": "2010-09-09",
"Resources": {
"Status": {
"Type": "AWS::DynamoDB::Table",
"Properties": {
"AttributeDefinitions": [
{
"AttributeName": "SId",
"AttributeType": "S"
}
],
"KeySchema": [
{
"AttributeName": "SId",
"KeyType": "HASH"
}
],
"ProvisionedThroughput": {
"ReadCapacityUnits": "1",
"WriteCapacityUnits": "1"
},
"TableName": "Statuscf",
"Tags": [
{
"Key": "Application",
"Value": "BFMS"
},
{
"Key": "Name",
"Value": "EventSourcingDataStore"
}
]
}
},
"BMSHSData": {
"Type": "AWS::DynamoDB::Table",
"Properties": {
"TableName": "Billing.FmsDatacf",
"Tags": [{"Ref":"/Status/Tags"}]
}
}
}
Please suggest me how I can used same tags in another table. I am using Like this "Tags": [{"Ref":"/Status/Tags"}].
The only way to do this using plain CloudFormation is by copy-and-paste. So you have to replicate your tags for all tables "manually". The only automated solution would be through development of CloduFormation macro or custom resource. Yet another choice could be through the use of nested stacks.
To resolved this problem, Just need to pass Tags Properties in the Parameter Section of CF.
Then used those tags in DynamoDB like this

AWS DMS CDC - Only capture changed values not entire record? (Source RDS MySQL)

I have a DMS CDC task set (change data capture) from a MySQL database to stream to a Kinesis stream which a Lambda is connected to.
I was hoping to ultimately receive only the value that has changed and not on entire dump of the row, this way I know what column is being changed (at the moment it's impossible to decipher this without setting up another system to track changes myself).
Example, with the following mapping rule:
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "my-schema",
"table-name": "product"
},
"rule-action": "include",
"filters": []
},
and if I changed the name property of a record on the product table, I would hope to recieve a record like this:
{
"data": {
"name": "newValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}
However what I actually recieve is something like this:
{
"data": {
"name": "newValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}
As you can see when receiving this, it's impossible to decipher what property has changed without setting up additional systems to track this.
I've found another stackoverflow thread where someone is posting an issue because their CDC is doing what I want mine to do. Can anyone point me into the right direction to achieve this?
I found the answer after digging into AWS documentation some more.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html#CHAP_Target.Kinesis.BeforeImage
Different source database engines provide different amounts of
information for a before image:
Oracle provides updates to columns only if they change.
PostgreSQL provides only data for columns that are part of the primary
key (changed or not).
MySQL generally provides data for all columns (changed or not).
I used the BeforeImageSettings on the task setting to include the original data with payloads.
"BeforeImageSettings": {
"EnableBeforeImage": true,
"FieldName": "before-image",
"ColumnFilter": "all"
}
While this still gives me the whole record, it give me enough data to work out what's changed without additional systems.
{
"data": {
"name": "newValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"before-image": {
"name": "oldValue",
"id": "unchangedId",
"quantity": "unchangedQuantity",
"otherProperty": "unchangedValue"
},
"metadata": {
"timestamp": "2021-07-26T06:47:15.762584Z",
"record-type": "data",
"operation": "update",
"partition-key-type": "schema-table",
"schema-name": "my-schema",
"table-name": "product",
"transaction-id": 8633730840
}
}

Does load-order and parallel-load work in combination in AWS DMS?

I have two tables let's call it tbl_parent and tbl_child in Microsoft SQL Server, and trying to transfer the data of these tables to AWS aurora PostgreSQL.
Here tbl_child has a foreign key reference on tbl_parent, so to avoid FK issues during the full load I have mentioned higher load-order for tbl_parent than table tbl_child. The same was suggested in aws documentation.
So with that load-order setting, I was expecting that the data would only be inserted into tbl_child after the complete insertion is done in tbl_parent .
Also I want to do a parallel load into child table, so I have specified table-settings rule to perform parallel load as suggested in aws documentation
For some reason I see the data into tbl_child is getting inserted even before tbl_parent load is complete, and I'm getting Foreign key issues like below ...
2021-07-14T18:53:43 [TARGET_LOAD ]W: Load command output: psql:
/usr/lib64/libcom_err.so.2: no version information available (required
by /rdsdbbin/awsdms/lib/libgssapi_krb5.so.2) psql:
/usr/lib64/libcom_err.so.2: no version information available (required
by /rdsdbbin/awsdms/lib/libkrb5.so.3) ERROR: insert or update on
table "tbl_child" violates foreign key constraint
"tbl_child_tbl_parent_fkey" DETAIL: Key (parent_id)=(1468137) is not
present in table "tbl_parent". (csv_target.c:1018)
In case if it helps please find the mapping rules json below
{
"rules": [
{
"rule-type": "selection",
"rule-id": "101",
"rule-name": "101",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_parent"
},
"rule-action": "include",
"filters": [],
"load-order": 2
},
{
"rule-type": "selection",
"rule-id": "102",
"rule-name": "102",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_child"
},
"rule-action": "include",
"filters": [],
"load-order": 1
},
{
"rule-type": "table-settings",
"rule-id": "131",
"rule-name": "Parallel_Range_Child",
"object-locator": {
"schema-name": "dbo",
"table-name": "tbl_child"
},
"parallel-load": {
"type": "ranges",
"columns": [
"child_id"
],
"boundaries": [
[
"100"
],
[
"200"
],
[
"300"
]
]
}
}
]
}
If you want to strictly control load order, for example, finished one table then another table, you must set maxFullloadsubtasks=1.
Actually, to avoid such issues one way is to disable FK/RI checking in target db, such options are a/v in SQL Server/MySQL/Oracle, but seems not a/v in PgSQL. Anyway your steps for full + CDC using DMS maybe:
Create the basic table schema including primary key in target db
DMS full load
Create FK/RIs/index and other db objects
DMS CDC

What does 'schema' refer to in AWS Database Migration Service (DMS) if source database is S3?

I'm trying to transfer a test csv file from S3 to DynamoDB table using AWS Database Migration Service. I'm new to aws so forgive me if I'm doing something completely wrong.
I've created and tested source & target endpoints with no problem. However, I've run into some task definition errors (I'm not sure why but my logs don't appear in CloudWatch).
For simplicity's sake, my test source S3 file has only one column: eventId. The path is as follows: s3://myBucket/testFolder/events/events.csv
This is a JSON external definition file:
{
"TableCount": "1",
"Tables": [
{
"TableName": "events",
"TablePath": "testFolder/events/",
"TableOwner": "testFolder",
"TableColumns": [
{
"ColumnName": "eventId",
"ColumnType": "STRING",
"ColumnNullable": "false",
"ColumnIsPk": "true",
"ColumnLength": "10",
}
],
"TableColumnsTotal": "1"
}
]
}
Here's my task definition:
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "testFolder",
"table-name": "events"
},
"rule-action": "include"
},
{
"rule-type": "object-mapping",
"rule-id": "2",
"rule-name": "2",
"rule-action": "map-record-to-record",
"object-locator": {
"schema-name": "testFolder",
"table-name": "tableName"
},
"target-table-name": "myTestDynamoDBTable",
"mapping-parameters": {
"partition-key-name": "eventId",
"attribute-mappings": [
{
"target-attribute-name": "eventId",
"attribute-type": "scalar",
"attribute-sub-type": "string",
"value": "${eventId}"
}
]
}
}
]
}
Every time, my task is errored. I'm particularly confused about schema as my source file is in S3 so I thought schema is not needed there? I found this line in AWS docs:
s3://mybucket/hr/employee. At load time, AWS DMS assumes that the source schema name is hr... -> So should I include some sort of schema file in the hr folder?
Apologies if this is wrong, I'd appreciate any advice. Thanks.