Boto3 DMS 'modify_replication_task' error on replication task settings JSON - amazon-web-services

I'm using boto3 to create DMS replication tasks. I'm using the following replication_task_settings.json to create the replication tasks:
{
"TargetMetadata": {
"TargetSchema": "",
"SupportLobs": true,
"FullLobMode": false,
"LobChunkSize": 0,
"LimitedSizeLobMode": true,
"LobMaxSize": 256,
"InlineLobMaxSize": 0,
"LoadMaxFileSize": 0,
"ParallelLoadThreads": 0,
"ParallelLoadBufferSize": 0,
"BatchApplyEnabled": false,
"TaskRecoveryTableEnabled": false
},
"FullLoadSettings": {
"TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD",
"CreatePkAfterFullLoad": false,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"MaxFullLoadSubTasks": 8,
"TransactionConsistencyTimeout": 1000,
"CommitRate": 10000
},
"Logging": {
"EnableLogging": true,
"LogComponents": [
{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DEFAULT"
}
],
},
"ControlTablesSettings": {
"ControlSchema": "control",
"HistoryTimeslotInMinutes": 5,
"HistoryTableEnabled": true,
"SuspendedTablesTableEnabled": true,
"StatusTableEnabled": true
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"StreamBufferSizeInMB": 8,
"CtrlStreamBufferSizeInMB": 5
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": false,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": false
},
"ErrorBehavior": {
"DataErrorPolicy": "LOG_ERROR",
"DataTruncationErrorPolicy": "LOG_ERROR",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 0,
"TableErrorPolicy": "SUSPEND_TABLE",
"TableErrorEscalationPolicy": "STOP_TASK",
"TableErrorEscalationCount": 0,
"RecoverableErrorCount": -1,
"RecoverableErrorInterval": 5,
"RecoverableErrorThrottling": true,
"RecoverableErrorThrottlingMax": 1800,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"ApplyErrorEscalationCount": 0,
"ApplyErrorFailOnTruncationDdl": false,
"FullLoadIgnoreConflicts": true,
"FailOnTransactionConsistencyBreached": false,
"FailOnNoTablesCaptured": false
},
"ChangeProcessingTuning": {
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchApplyTimeoutMax": 30,
"BatchApplyMemoryLimit": 500,
"BatchSplitSize": 0,
"MinTransactionSize": 1000,
"CommitTimeout": 1,
"MemoryLimitTotal": 1024,
"MemoryKeepTime": 60,
"StatementCacheSize": 50
},
"ValidationSettings": {
"EnableValidation": true,
"ValidationMode": "ROW_LEVEL",
"ThreadCount": 5,
"PartitionSize": 10000,
"FailureMaxCount": 10000,
"RecordFailureDelayInMinutes": 5,
"RecordSuspendDelayInMinutes": 30,
"MaxKeyColumnSize": 8096,
"TableFailureMaxCount": 1000,
"ValidationOnly": false,
"HandleCollationDiff": false,
"RecordFailureDelayLimitInMinutes": 0
}
}
The JSON above works fine when calling dms_client.create_replication_task. However, it doesn't work when modifying the replication tasks.
When calling dms_client.modify_replication_task with the replication_task_settings.json mentioned above I get the following error:
botocore.exceptions.ClientError: An error occurred (InvalidParameterValueException) when calling the ModifyReplicationTask operation: Invalid task settings JSON
I'm not sure why this is happening and any help would be greatly appreciated!
I tried removing some settings that are already defaulted. I tried looking for malformed JSON but nothing obvious.
I would expect the replication_task_settings,json to work for both creating and modifying DMS replication tasks.

I had similar issue while trying to modify the replication task using CLI :
aws --profile non-prod dms modify-replication-task --replication-task-arn arn:aws:dms:ap-southeast-3:567384657322:task:ABC --replication-task-settings file:/json_task --region ap-southeast-3
This is the error :
An error occurred (InvalidParameterValueException) when calling the
ModifyReplicationTask operation: Invalid task settings JSON
Below command worked :
aws --profile non-prod dms modify-replication-task --replication-task-arn arn:aws:dms:ap-southeast-3:567384657322:task:ABC --replication-task-settings file://json_task --region ap-southeast-3
Change is to use two "//" before JSON file name to modify the task.

The boto3 document for modify_replication_task() for TableMappings suggests -- When using the CLI or boto3, provide the path of the JSON file that contains the table mappings. Precede the path with file:// . For example, --table-mappings file://mappingfile.json . When working with the DMS API, provide the JSON as the parameter value.
Yes, you are absolutely correct. Providing JSON file as a parameter works in CLI but not with boto3. In boto3, it throws the error 'Invalid TableMappings / Invalid JSON error'.
Try the below to make it work:
import boto3
dms_client = boto3.client('dms')
table_mapping = {
"rules":[
{
"rule-type":"selection",
"rule-id":str(idx),
"rule-name":str(idx),
"object-locator":{
"schema-name":schema_name,
"table-name":table_name
},
"rule-action":"include"
}
]
}
dms_client.modify_replication_task(
ReplicationTaskArn=repl_task_arn,
TableMappings=json.dumps(table_mapping)
)

I faced a similar issue yesterday.
My json is in S3 and the final solution was simply applying indent=4
obj = s3.Object(bucket, key)
new_json = obj.get()['Body'].read().decode('utf-8')
new_json = json.loads(new_json)
client.modify_replication_task(
ReplicationTaskArn=taskarn,
TableMappings=json.dumps(new_json, indent=4)
)

Related

AWS DMS Task - Reading from source is paused. Total storage used by swap files exceeded the limit

I am using AWS DMS to run a migration task full load + cdc. I am migrating a like for like rds mysql database to another rds mysql database.
Its been at 97% for a while now and i can see in the cloudwatch logs a few times the following message.
2023-02-01T19:52:57 [SORTER ]I: Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes (sorter_transaction.c:110)
This suggests to me either the source, target or replication instance are using 1gb of swap storage. However when i check cloudwatch that does not seem to be the case.
Whats happening here
replica instance
target instance
source instance
task settings
{
"Logging": {
"EnableLogging": true,
"EnableLogContext": false,
"LogComponents": [
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "TRANSFORMATION"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "SOURCE_UNLOAD"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "IO"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "TARGET_LOAD"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "PERFORMANCE"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "SOURCE_CAPTURE"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "SORTER"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "REST_SERVER"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "VALIDATOR_EXT"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "TARGET_APPLY"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "TASK_MANAGER"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "TABLES_MANAGER"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "METADATA_MANAGER"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "FILE_FACTORY"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "COMMON"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "ADDONS"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "DATA_STRUCTURE"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "COMMUNICATION"
},
{
"Severity": "LOGGER_SEVERITY_DEFAULT",
"Id": "FILE_TRANSFER"
}
],
"CloudWatchLogGroup": "dms-tasks-geeiq-prod-master-replication-instance",
"CloudWatchLogStream": "dms-task-PLBNPFYKIAWHZDEPAPTDASAD4P6GCIPMRW3ZRXA"
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"CtrlStreamBufferSizeInMB": 5,
"StreamBufferSizeInMB": 8
},
"ErrorBehavior": {
"FailOnNoTablesCaptured": true,
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"FailOnTransactionConsistencyBreached": false,
"RecoverableErrorThrottlingMax": 1800,
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"ApplyErrorEscalationCount": 0,
"RecoverableErrorStopRetryAfterThrottlingMax": true,
"RecoverableErrorThrottling": true,
"ApplyErrorFailOnTruncationDdl": false,
"DataTruncationErrorPolicy": "LOG_ERROR",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"EventErrorPolicy": "IGNORE",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"RecoverableErrorCount": -1,
"DataErrorEscalationCount": 0,
"TableErrorEscalationPolicy": "STOP_TASK",
"RecoverableErrorInterval": 5,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"TableErrorEscalationCount": 0,
"FullLoadIgnoreConflicts": true,
"DataErrorPolicy": "LOG_ERROR",
"TableErrorPolicy": "SUSPEND_TABLE"
},
"TTSettings": {
"TTS3Settings": null,
"TTRecordSettings": null,
"EnableTT": false
},
"FullLoadSettings": {
"CommitRate": 10000,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"MaxFullLoadSubTasks": 8,
"TransactionConsistencyTimeout": 600,
"CreatePkAfterFullLoad": false,
"TargetTablePrepMode": "DROP_AND_CREATE"
},
"TargetMetadata": {
"ParallelApplyBufferSize": 0,
"ParallelApplyQueuesPerThread": 0,
"ParallelApplyThreads": 0,
"TargetSchema": "",
"InlineLobMaxSize": 0,
"ParallelLoadQueuesPerThread": 0,
"SupportLobs": true,
"LobChunkSize": 64,
"TaskRecoveryTableEnabled": false,
"ParallelLoadThreads": 0,
"LobMaxSize": 0,
"BatchApplyEnabled": false,
"FullLobMode": true,
"LimitedSizeLobMode": false,
"LoadMaxFileSize": 0,
"ParallelLoadBufferSize": 0
},
"BeforeImageSettings": null,
"ControlTablesSettings": {
"historyTimeslotInMinutes": 5,
"HistoryTimeslotInMinutes": 5,
"StatusTableEnabled": false,
"SuspendedTablesTableEnabled": false,
"HistoryTableEnabled": false,
"ControlSchema": "",
"FullLoadExceptionTableEnabled": false
},
"LoopbackPreventionSettings": null,
"CharacterSetSettings": null,
"FailTaskWhenCleanTaskResourceFailed": false,
"ChangeProcessingTuning": {
"StatementCacheSize": 50,
"CommitTimeout": 1,
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchSplitSize": 0,
"BatchApplyTimeoutMax": 30,
"MinTransactionSize": 1000,
"MemoryKeepTime": 60,
"BatchApplyMemoryLimit": 500,
"MemoryLimitTotal": 1024
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": true,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": true
},
"PostProcessingRules": null
}
This error message "Reading from source is paused. Total storage used by swap files exceeded the limit ..." is related to contention on target database to apply changes, so, DMS pause source reading operation to avoid more accumulation of database and have storage full situation. Missing primary keys or indexes could case a full scan operation during UPDATE or DELETE operation causing a performance issues.
You could analyze your target apply latency, review link below to analyze your issue.
How can I troubleshoot high target latency on an AWS DMS task?
What are SWAP files and why are they consuming space on my AWS DMS instance?

AWS DMS swap files consumes all the space

I'm migrating many databases but I have seen my databases with a size bigger than 50GB fails in the CDC after some time due to a lack of storage.
I'm using a replication instance class dms.r5.large and everything runs smoothly until the full-load is completed.
When the CDC starts I got logs messages likes these:
D: There are 188 swap files of total size 93156 Mb. Left to process 188 of size 93156 Mb
But swap files are never dropped the instance keeps accumulating swap files and eventually the instance runs out of storage.
A thing to notice is my swap usage in the monitoring metrics is near to zero.
I have already tried with a dms.r5.xlarge and the issue was the same, which makes me think memory is not a problem.
Do you know what could be the cause of this behavior?
Is there a way to debug this?
Thank you!
More useful data:
Replication instance class: dms.r5.large, I have tried with dms.r5.xlarge.
40GB of storage, I have tried with 300GB but eventually the CDC phase consume all the storage.
The database to migrate is about 80GB.
Task settings:
{
"TargetMetadata": {
"TargetSchema": "",
"SupportLobs": true,
"FullLobMode": false,
"LobChunkSize": 0,
"LimitedSizeLobMode": true,
"LobMaxSize": 32,
"InlineLobMaxSize": 0,
"LoadMaxFileSize": 0,
"ParallelLoadThreads": 0,
"ParallelLoadBufferSize": 0,
"BatchApplyEnabled": false,
"TaskRecoveryTableEnabled": false,
"ParallelLoadQueuesPerThread": 0,
"ParallelApplyThreads": 0,
"ParallelApplyBufferSize": 0,
"ParallelApplyQueuesPerThread": 0
},
"FullLoadSettings": {
"TargetTablePrepMode": "DROP_AND_CREATE",
"CreatePkAfterFullLoad": false,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"MaxFullLoadSubTasks": 8,
"TransactionConsistencyTimeout": 600,
"CommitRate": 10000
},
"Logging": {
"EnableLogging": true,
"LogComponents": [{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_INFO"
},{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DEBUG"
}]
},
"ControlTablesSettings": {
"historyTimeslotInMinutes": 5,
"ControlSchema": "",
"HistoryTimeslotInMinutes": 5,
"HistoryTableEnabled": false,
"SuspendedTablesTableEnabled": false,
"StatusTableEnabled": false
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"StreamBufferSizeInMB": 8,
"CtrlStreamBufferSizeInMB": 5
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": true,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": true
},
"ErrorBehavior": {
"DataErrorPolicy": "LOG_ERROR",
"DataTruncationErrorPolicy": "LOG_ERROR",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 0,
"TableErrorPolicy": "SUSPEND_TABLE",
"TableErrorEscalationPolicy": "STOP_TASK",
"TableErrorEscalationCount": 0,
"RecoverableErrorCount": -1,
"RecoverableErrorInterval": 5,
"RecoverableErrorThrottling": true,
"RecoverableErrorThrottlingMax": 1800,
"RecoverableErrorStopRetryAfterThrottlingMax": false,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"ApplyErrorEscalationCount": 0,
"ApplyErrorFailOnTruncationDdl": false,
"FullLoadIgnoreConflicts": true,
"FailOnTransactionConsistencyBreached": false,
"FailOnNoTablesCaptured": false
},
"ChangeProcessingTuning": {
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchApplyTimeoutMax": 30,
"BatchApplyMemoryLimit": 500,
"BatchSplitSize": 0,
"MinTransactionSize": 1000,
"CommitTimeout": 1,
"MemoryLimitTotal": 1024,
"MemoryKeepTime": 60,
"StatementCacheSize": 50
},
"ValidationSettings": {
"EnableValidation": true,
"ValidationMode": "ROW_LEVEL",
"ThreadCount": 5,
"PartitionSize": 10000,
"FailureMaxCount": 10000,
"RecordFailureDelayInMinutes": 5,
"RecordSuspendDelayInMinutes": 30,
"MaxKeyColumnSize": 8096,
"TableFailureMaxCount": 1000,
"ValidationOnly": false,
"HandleCollationDiff": false,
"RecordFailureDelayLimitInMinutes": 0,
"SkipLobColumns": false,
"ValidationPartialLobSize": 0,
"ValidationQueryCdcDelaySeconds": 0
},
"PostProcessingRules": null,
"CharacterSetSettings": null,
"LoopbackPreventionSettings": null,
"BeforeImageSettings": null
}
Issues were due to a high target latency, the root cause was the structure of the database tables.
Tables with significant amount of records had a lack of primary keys or unique identifiers that causes a full-table scans, changes were not applied and then saved in the replication instance storage.
Eventually the instance will run out of storage.
To fix this you should run a pre migration assessment to check if you database applies for DMS migration.
Another way to fix this is add an extra column in the migration to create a unique key and remove it after the migration.
You should increase configuration in Task settings as well. Like -
MemoryLimitTotal
BatchApplyMemoryLimit

'CharacterSetSettings' in AWS

I'm using AWS Database Migration Service (DMS) to create replication tasks. I noticed that in the TaskSettings JSON there's an attribute called CharacterSetSettings. However, there seems to be no documentation for this attribute.
I tried searching documentation and google for this attribute nothing comes up.
Task Settings JSON:
{
"TargetMetadata": {
"TargetSchema": "",
"SupportLobs": true,
"FullLobMode": false,
"LobChunkSize": 0,
"LimitedSizeLobMode": true,
"LobMaxSize": 256,
"InlineLobMaxSize": 0,
"LoadMaxFileSize": 0,
"ParallelLoadThreads": 0,
"ParallelLoadBufferSize": 0,
"BatchApplyEnabled": false,
"TaskRecoveryTableEnabled": false
},
"FullLoadSettings": {
"TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD",
"CreatePkAfterFullLoad": false,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"MaxFullLoadSubTasks": 8,
"TransactionConsistencyTimeout": 1000,
"CommitRate": 10000
},
"Logging": {
"EnableLogging": true,
"LogComponents": [
{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DEFAULT"
}
],
"CloudWatchLogGroup": "dms-tasks-altitude-staging-to-prod-instance",
"CloudWatchLogStream": "dms-task-RU2FBO6SGUASAODLXHJ63OYDAM"
},
"ControlTablesSettings": {
"historyTimeslotInMinutes": 5,
"ControlSchema": "control",
"HistoryTimeslotInMinutes": 5,
"HistoryTableEnabled": true,
"SuspendedTablesTableEnabled": true,
"StatusTableEnabled": true
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"StreamBufferSizeInMB": 8,
"CtrlStreamBufferSizeInMB": 5
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": true,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": true
},
"ErrorBehavior": {
"DataErrorPolicy": "LOG_ERROR",
"DataTruncationErrorPolicy": "LOG_ERROR",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 0,
"TableErrorPolicy": "SUSPEND_TABLE",
"TableErrorEscalationPolicy": "STOP_TASK",
"TableErrorEscalationCount": 0,
"RecoverableErrorCount": -1,
"RecoverableErrorInterval": 5,
"RecoverableErrorThrottling": true,
"RecoverableErrorThrottlingMax": 1800,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"ApplyErrorEscalationCount": 0,
"ApplyErrorFailOnTruncationDdl": false,
"FullLoadIgnoreConflicts": true,
"FailOnTransactionConsistencyBreached": false,
"FailOnNoTablesCaptured": false
},
"ChangeProcessingTuning": {
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchApplyTimeoutMax": 30,
"BatchApplyMemoryLimit": 500,
"BatchSplitSize": 0,
"MinTransactionSize": 1000,
"CommitTimeout": 1,
"MemoryLimitTotal": 1024,
"MemoryKeepTime": 60,
"StatementCacheSize": 50
},
"ValidationSettings": {
"EnableValidation": true,
"ValidationMode": "ROW_LEVEL",
"ThreadCount": 5,
"PartitionSize": 10000,
"FailureMaxCount": 10000,
"RecordFailureDelayInMinutes": 5,
"RecordSuspendDelayInMinutes": 30,
"MaxKeyColumnSize": 8096,
"TableFailureMaxCount": 1000,
"ValidationOnly": false,
"HandleCollationDiff": false,
"RecordFailureDelayLimitInMinutes": 0
},
"PostProcessingRules": null,
"CharacterSetSettings": null
}
There should be documentation on CharacterSetSettings. I'd expect that documentation to be listed on Specifying Task Settings for AWS Database Migration Service Tasks - AWS Database Migration Service.
You are searching for Character Substitution Task Settings. It seems to be fairly new.
You can specify that your replication task perform character
substitutions on the target database for all source database columns
with the AWS DMS STRING or WSTRING data type. You can configure
character substitution for any task with endpoints from the following
source and target databases:
Source databases:
Oracle
Microsoft SQL Server
MySQL
PostgreSQL
SAP Adaptive Server Enterprise (ASE)
IBM Db2 LUW
Target databases:
Oracle
Microsoft SQL Server
MySQL
PostgreSQL
SAP Adaptive Server Enterprise (ASE)
Amazon Redshift
The replication task completes all of the specified character substitutions before starting any global or table-level transformations that you specify through table mapping.
The object looks like this:
"CharacterSetSettings": {
"CharacterReplacements": [ {
"SourceCharacterCodePoint": int,
"TargetCharacterCodePoint": int
},
[...]
],
"CharacterSetSupport": {
"CharacterSet": str,
"ReplaceWithCharacterCodePoint": int
}
}
DMS processes character substitution in two phases:
In the first phase, CharacterReplacements is a list of objects that describe single replacements of one Unicode code point with another. For instance, if you wanted to replace b with a, set SourceCharacterCodePoint to 62, TargetCharacterCodePoint to 61.
In the second phase, setting CharacterSetSupport makes DMS validate that all characters are valid for CharacterSet (i.e. UTF-8, UTF-16, ISO-8859-1. Find the full list in the documentation) encoding. If it finds invalid characters, they will be replaced by ReplaceWithCharacterCodePoint.
For both phases, the special code point 0 can be used for deleting characters instead.

AWS DMS - InvalidParameterValueException) when calling the CreateReplicationTask operation: Replication Task Settings document error: Invalid json

I am trying to create a DMS replication task using the AWS DMS cli. I am trying to pass the task settings using a json file like this:
aws dms create-replication-task --replication-task-identifier dms-cli-test-replication-task-1 --source-endpoint-arn arn --target-endpoint-arn arn --replication-instance-arn arn --migration-type full-load-and-cdc --table-mappings ./table_mappings.json --replication-task-settings ./task_settings.json --region us-east-1
when I run this command, the below error is being thrown:
An error occurred (InvalidParameterValueException) when calling the CreateReplicationTask operation: Replication Task Settings document error: Invalid json
Below is the content of my task_settings.json file:
{
"TargetMetadata": {
"TargetSchema": "",
"SupportLobs": true,
"FullLobMode": true,
"LobChunkSize": 64,
"LimitedSizeLobMode": false,
"LobMaxSize": 0,
"InlineLobMaxSize": 0,
"LoadMaxFileSize": 0,
"ParallelLoadThreads": 0,
"ParallelLoadBufferSize": 0,
"BatchApplyEnabled": false,
"TaskRecoveryTableEnabled": false
},
"FullLoadSettings": {
"TargetTablePrepMode": "DO_NOTHING",
"CreatePkAfterFullLoad": false,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"MaxFullLoadSubTasks": 8,
"TransactionConsistencyTimeout": 600,
"CommitRate": 10000
},
"Logging": {
"EnableLogging": true,
"LogComponents": [
{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DEFAULT"
}
]
},
"ControlTablesSettings": {
"historyTimeslotInMinutes": 5,
"ControlSchema": "",
"HistoryTimeslotInMinutes": 5,
"HistoryTableEnabled": true,
"SuspendedTablesTableEnabled": true,
"StatusTableEnabled": true
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"StreamBufferSizeInMB": 8,
"CtrlStreamBufferSizeInMB": 5
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": true,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": true
},
"ErrorBehavior": {
"DataErrorPolicy": "LOG_ERROR",
"DataTruncationErrorPolicy": "LOG_ERROR",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 0,
"TableErrorPolicy": "SUSPEND_TABLE",
"TableErrorEscalationPolicy": "STOP_TASK",
"TableErrorEscalationCount": 0,
"RecoverableErrorCount": -1,
"RecoverableErrorInterval": 5,
"RecoverableErrorThrottling": true,
"RecoverableErrorThrottlingMax": 1800,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"ApplyErrorEscalationCount": 0,
"ApplyErrorFailOnTruncationDdl": false,
"FullLoadIgnoreConflicts": true,
"FailOnTransactionConsistencyBreached": false,
"FailOnNoTablesCaptured": false
},
"ChangeProcessingTuning": {
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchApplyTimeoutMax": 30,
"BatchApplyMemoryLimit": 500,
"BatchSplitSize": 0,
"MinTransactionSize": 1000,
"CommitTimeout": 1,
"MemoryLimitTotal": 1024,
"MemoryKeepTime": 60,
"StatementCacheSize": 50
},
"ValidationSettings": {
"EnableValidation": true,
"ValidationMode": "ROW_LEVEL",
"ThreadCount": 5,
"PartitionSize": 10000,
"FailureMaxCount": 10000,
"RecordFailureDelayInMinutes": 5,
"RecordSuspendDelayInMinutes": 30,
"MaxKeyColumnSize": 8096,
"TableFailureMaxCount": 1000,
"ValidationOnly": false,
"HandleCollationDiff": false,
"RecordFailureDelayLimitInMinutes": 0
},
"PostProcessingRules": null,
"CharacterSetSettings": null
}
I don't see any issues with the formatting of my json. I don't understand why it says invalid json. any advice is appreciated. Thank you.
This is not actually you pass setting document or any document to aws-cli. It usually precedes the path with "file://" while passing JSON file to any parameter of cli.
Try using below command:
aws dms create-replication-task --replication-task-identifier dms-cli-test-replication-task-1 --source-endpoint-arn arn --target-endpoint-arn arn --replication-instance-arn arn --migration-type full-load-and-cdc --table-mappings file://table_mappings.json --replication-task-settings file://task_settings.json --region us-east-1
However, I can't see this particular parameter takes json document as a setting file. But I think you should try. Check below snippet from the doc.
--replication-task-identifier (string)
The replication task identifier.
Constraints:
Must contain from 1 to 255 alphanumeric characters or hyphens.
First character must be a letter.
Cannot end with a hyphen or contain two consecutive hyphens.

AWS DMS not giving 100% migaration

HI all we are migrating out database from on premises to Amazon aurora.our database size is around 136GB moreover few tables have over millions of records each. Howover after full load complete out of millions rows approx 200,000 to 300,000 rows gets migrated.WE dont know where we are falling since we are new to DMS.Can anyone know how can we migrate exact count of rows.
migration type :full load
Here are our AWS DMS task settings
{
"TargetMetadata": {
"TargetSchema": "",
"SupportLobs": true,
"FullLobMode": true,
"LobChunkSize": 64,
"LimitedSizeLobMode": false,
"LobMaxSize": 0,
"LoadMaxFileSize": 0,
"ParallelLoadThreads": 0,
"BatchApplyEnabled": false
},
"FullLoadSettings": {
"FullLoadEnabled": true,
"ApplyChangesEnabled": false,
"TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD",
"CreatePkAfterFullLoad": false,
"StopTaskCachedChangesApplied": false,
"StopTaskCachedChangesNotApplied": false,
"ResumeEnabled": false,
"ResumeMinTableSize": 100000,
"ResumeOnlyClusteredPKTables": true,
"MaxFullLoadSubTasks": 15,
"TransactionConsistencyTimeout": 600,
"CommitRate": 10000
},
"Logging": {
"EnableLogging": true,
"LogComponents": [
{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_DEFAULT"
},
{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DEFAULT"
}
],
"CloudWatchLogGroup": "dms-tasks-krishna-smartdata",
"CloudWatchLogStream": "dms-task-UERQWLR6AYHYIEKMR3HN2VL7T4"
},
"ControlTablesSettings": {
"historyTimeslotInMinutes": 5,
"ControlSchema": "",
"HistoryTimeslotInMinutes": 5,
"HistoryTableEnabled": true,
"SuspendedTablesTableEnabled": true,
"StatusTableEnabled": true
},
"StreamBufferSettings": {
"StreamBufferCount": 3,
"StreamBufferSizeInMB": 8,
"CtrlStreamBufferSizeInMB": 5
},
"ChangeProcessingDdlHandlingPolicy": {
"HandleSourceTableDropped": true,
"HandleSourceTableTruncated": true,
"HandleSourceTableAltered": true
},
"ErrorBehavior": {
"DataErrorPolicy": "LOG_ERROR",
"DataTruncationErrorPolicy": "LOG_ERROR",
"DataErrorEscalationPolicy": "SUSPEND_TABLE",
"DataErrorEscalationCount": 0,
"TableErrorPolicy": "SUSPEND_TABLE",
"TableErrorEscalationPolicy": "STOP_TASK",
"TableErrorEscalationCount": 0,
"RecoverableErrorCount": -1,
"RecoverableErrorInterval": 5,
"RecoverableErrorThrottling": true,
"RecoverableErrorThrottlingMax": 1800,
"ApplyErrorDeletePolicy": "IGNORE_RECORD",
"ApplyErrorInsertPolicy": "LOG_ERROR",
"ApplyErrorUpdatePolicy": "LOG_ERROR",
"ApplyErrorEscalationPolicy": "LOG_ERROR",
"ApplyErrorEscalationCount": 0,
"FullLoadIgnoreConflicts": true
},
"ChangeProcessingTuning": {
"BatchApplyPreserveTransaction": true,
"BatchApplyTimeoutMin": 1,
"BatchApplyTimeoutMax": 30,
"BatchApplyMemoryLimit": 500,
"BatchSplitSize": 0,
"MinTransactionSize": 1000,
"CommitTimeout": 1,
"MemoryLimitTotal": 1024,
"MemoryKeepTime": 60,
"StatementCacheSize": 50
}
}
Mapping Method:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "dbo",
"table-name": "%"
},
"rule-action": "include"
},
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "2",
"rule-target": "schema",
"object-locator": {
"schema-name": "dbo"
},
"rule-action": "rename",
"value": "smartdata_int"
}
]
}
You should have the option of setting up CloudWatch logs for each DMS task. Have you inspected the logs for this task? Do you have varchar/text columns > 32KB? These will be truncated when migrating data into a target like redshift, so be aware that this will count towards your error count.
First thing to do is to increase log level :
"Logging": {
"EnableLogging": true,
"LogComponents": [{
"Id": "SOURCE_UNLOAD",
"Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
},{
"Id": "SOURCE_CAPTURE",
"Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
},{
"Id": "TARGET_LOAD",
"Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
},{
"Id": "TARGET_APPLY",
"Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
},{
"Id": "TASK_MANAGER",
"Severity": "LOGGER_SEVERITY_DETAILED_DEBUG"
}]
},
Then you will be able to get details about errors occuring.
Turn on validation:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Validating.html
This will slow the migration down so you cloud also look at splitting this out into multiple tasks and running them on multiple replication instances, expand rule 1 out into multiple rules, rather than '%' add a condition that meets a subset of the tables.
You might also try a different replication engine, 3.1.1 has just been released, at the time of writing there are no release notes for 3.1.1.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html