False positives with AWS DMS validations - amazon-web-services

I am using DMS to move data (5.200.000 records) between an RDS SQL Server Express database and an RDS MySQL database. The number of records transferred is perfect : every record count per table matches exactly.
However when running the data validation step with the AWS tool, I get a lot of mismatched records that I believe is an issue in the DMS validation process at the driver level (charset configuration probably or character encoding).
Example:
RECORD_DIFF [{'Title': 'سلام ژوند څنګه دی (1) 😀🥳'}, {'Title': 'سلام ژوند څنګه دی (1) ??'},]
If I do the select query directly in the MySQL database with MySQL Workbench I get exactly the same emojis as I the DMS validation tool gets from the SQL Server.
Besides this I am also getting a lot of RECORD_DIFF [{'Content': '<VALUE_PRINT_NOT_SUPPORTED>'}, {'Content': '<VALUE_PRINT_NOT_SUPPORTED>'},] which I am not sure what they mean querying both database I see the values perfectly for those rows.
Should I consider using another validation tool or there is a known way to address the problems above?
Any help will be greatly appreciated.

Related

Power BI Embedded Approach for 100s of SQL Targets

I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls

Row level changes captured via AWS DMS

I am trying to migrate the database using AWS DMS. Source is Azure SQL server and destination is Redshift. Is there any way to know the rows updated or inserted? We dont have any audit columns in source database.
Redshift doesn’t track changes and you would need to have audit columns to do this at the user level. You may be able to deduce this from Redshift query history and save data input files but this will be solution dependent. Query history can be achieved in a couple of ways but both require some action. The first is to review the query logs but these are only saved for a few days. If you need to look back further than this you need a process to save these tables so the information isn’t lost. The other is to turn on Redshift logging to S3 but this would need to be turned on before you run queries on Redshift. There may be some logging from DMS that could be helpful but I think the bottom line answer is that row level change tracking is not something that is on in Redshift by default.

AWS DMS validation fails

I migrated data from SQL Server database to Aurora Postgres, using AWS DMS. Everything works and data is migrated correctly, but then validation fails. There are two types of validation errors:
GUIDS in the source database are all uppercase and in the target: lowercase.
{'record_id': 'DA7D98E2-06EA-4C3E-A148-3215E1C23384'}
{'record_id': 'da7d98e2-06ea-4c3e-a148-3215e1c23384'}
For some reason, validation fails between timestamp(4) column in Postgres and datetime2(4) column of SQLServer. It seems like the time in Postgres has two extra 0's at the end, but when selecting data from the table normally, the data is exactly the same.
{'created_datetime_utc': '2018-08-24 19:58:28.4900'}
{'created_datetime_utc': '2018-08-24 19:58:28.490000'}
Any ideas how to fix this? I tried to create transformation rules for columns, but they do not work.
Thank you.
Thanks to this article https://www.sentiatechblog.com/aws-database-migration-service-dms-tips-and-tricks, these new mapping rules fixed all validation issues. These rules cannot be added using AWS Console, only in the script.

AWS DMS - Migrate - only schema

We have noticed that if a table is empty in SQL Server, the empty table does not come via DMS. Only after inserting a record it starts to show up.
Just checking, is there a way to get the schema only from DMS?
Thanks
You can use Schema conversion tool for moving DB objects and Schema. Its a free tool by AWS and can be installed on On-Prem server or on EC2. It gives a good report before you can actually migrate the DB schema and other DB objects. It shows how many Tables, SP's Funcs etc can be directly migrated and shows possible solutions too.

AWS DMS Binary Reader + Oracle REDO logs vs Binary Reader + Archived Logs

I am planning a migration from an on-premises Oracle 18c (1.5TB of data, 400TPS) to AWS-hosted databases using AWS Database Migration Service.
According to the official DMS documentation, DMS Binary Reader seems to be the only choice because our database is a PDB instance, and it can handle the REDO logs or the archived logs as the source for Change Data Capture.
I am assuming the archived logs would be a better choice in terms of CDC performance because they are smaller in size than the online REDO logs, but not really sure of other benefits of choosing the archived logs as the CDC source over the REDO logs. Does anyone know?
Oracle mining will mine the online redo logs until it gets behind then it will mine the archive logs. You have two options for CDC either Oracle LogMiner or Oracle Binary Reader.
In general, use Oracle LogMiner for migrating your Oracle database unless you have one of the following situations:
You need to run several migration tasks on the source Oracle database.
The volume of changes or the redo log volume on the source Oracle database is high. When using Oracle LogMiner as a source database, the 32 KB buffer limit within LogMiner impacts the performance of change data capture on databases with a high volume of change. For example, the 10GB per hour change rate of a LogMiner source database can exceed DMS change data capture capabilities.
Your workload includes UPDATE statements that update only LOB columns. In this case, use Binary Reader. These UPDATE statements aren't supported by Oracle LogMiner.
Your source is Oracle version 11 and you perform UPDATE statements on XMLTYPE and LOB columns. In this case, you must use Binary Reader. These statements aren't supported by Oracle LogMiner.
You are migrating LOB columns from Oracle 12c. For Oracle 12c, LogMiner doesn't support LOB columns, so in this case use Binary Reader.