Related
I needed some help with my DMS migration. Basically, I have a source and target database with a condition to add a new column to the target table where we do some arithmetic computation on the source column. But on AWS, I can only find examples of concatenating strings but no number calculations. Could someone please share their experience on how to do arithmetic on number data.
Example of the string concat that I saw:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html
My Table Schema and mapping.json file snippetsou:
# Source table example
CREATE TABLE USER_INFO (
ID INT PRIMARY KEY,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
EMAIL VARCHAR(50),
GENDER VARCHAR(50),
IP_ADDRESS VARCHAR(20)
);
# Json rule for DMS transformation
{
"rule-type": "transformation",
"rule-id": "5",
"rule-name": "5",
"rule-action": "add-column",
"rule-target": "column",
"object-locator": {
"schema-name": "source_database_name_goes_here",
"table-name": "USER_INFO"
},
"value": "new_column_name_for_target_table",
"expression": "$ID*1000+2", ////////// Does this work? $ID is source table ID field//////////////
"data-type": {
"type": "integer",
"length": 10
}
your rule expresion will work, I have used transformation rule below and works fine
{
"rules": [
{
"rule-type": "transformation",
"rule-id": "644091346",
"rule-name": "644091346",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "USER_ID_NEW",
"expression": "$USER_ID*10",
"data-type": {
"type": "int8"
}
},
{
"rule-type": "selection",
"rule-id": "643832693",
"rule-name": "643832693",
"object-locator": {
"schema-name": "ADMIN",
"table-name": "TB"
},
"rule-action": "include",
"filters": []
}
]}
you could review link below to deep dive little more about transformation rules.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html
I had a quick question regarding Amazon DMS transformation capabilities. So basically, I have a source database (MySQL) that I need to migrate to destination database (Aurora). During this migration we have a Primary Key called id that needs to transferred as source_id in the Aurora and we have another ID field in the aurora whose value is some calculation done to the id of the source. Basically as shown below:
Source DB (id) -----> Target DB (source_id)
Source DB (id) -----> Some Calculations (Example: id+50)-----> Target DB (ID)
Is this feasible via DMS ?
regarding your specific requirement, below the JSON rules to be applied to the replication task.
{
"rules": [
{
"rule-type": "transformation",
"rule-id": "6440913467",
"rule-name": "644091347",
"rule-target": "column",
"object-locator": {
"schema-name": "ADMIN",
"table-name": "TB"
},
"rule-action": "add-column",
"value": "source_id",
"expression": "$id",
"data-type": {
"type": "int8"
}
},
{
"rule-type": "transformation",
"rule-id": "644091346",
"rule-name": "644091346",
"rule-target": "column",
"object-locator": {
"schema-name": "ADMIN",
"table-name": "TB"
},
"rule-action": "add-column",
"value": "ID",
"expression": "$id+50",
"data-type": {
"type": "int8"
}
},
{
"rule-type": "selection",
"rule-id": "643832693",
"rule-name": "643832693",
"object-locator": {
"schema-name": "ADMIN",
"table-name": "TB"
},
"rule-action": "include",
"filters": []
}
]}
you could review link below to deep dive little more about transformation rules.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html
Here is the another custom rule, more specific for your requirement
Transform schema name from source_db to target_db
Transform table name from source_tb to target_db
Add Column source_id with value from id column
Transform column name id to column name ID.
Here is the code
---### SOURCE SCHEMA : SOURCE_DB
create table "source_tb" ("id" int, "name" varchar(32));
insert into "source_tb" values (1,'one');
insert into "source_tb" values (2,'two');
commit;
--## AWS DMS TASK RULE
{
"rules": [
{
"rule-type": "transformation",
"rule-id": "894635485",
"rule-name": "894635485",
"rule-target": "column",
"object-locator": {
"schema-name": "SOURCE_DB",
"table-name": "source_tb"
},
"rule-action": "add-column",
"value": "source_id",
"expresion": "$id",
"data-type": {
"type": "int8"
}
},
{
"rule-type": "transformation",
"rule-id": "894635486",
"rule-name": "894635486",
"rule-target": "column",
"object-locator": {
"schema-name": "SOURCE_DB",
"table-name": "source_tb"
},
"rule-action": "add-column",
"value": "ID",
"expresion": "$id+50",
"data-type": {
"type": "int8"
}
},
{
"rule-type": "transformation",
"rule-id": "893830603",
"rule-name": "893830603",
"rule-target": "table",
"object-locator": {
"schema-name": "SOURCE_DB",
"table-name": "source_tb"
},
"rule-action": "rename",
"value": "target_tb",
"old-value": null
},
{
"rule-type": "transformation",
"rule-id": "893722068",
"rule-name": "893491548",
"rule-target": "schema",
"object-locator": {
"schema-name": "SOURCE_DB"
},
"rule-action": "rename",
"value": "TARGET_DB",
"old-value": null
},
{
"rule-type": "selection",
"rule-id": "893475728",
"rule-name": "893475728",
"object-locator": {
"schema-name": "SOURCE_DB",
"table-name": "%"
},
"rule-action": "include",
"filters": []
}
]}
You could review link below for dive deep.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html
I have data with multiple dimensions, stored in the Druid cluster. for example, Data of movies and the revenue they earned from each country where they were screened.
I'm trying to build a query that the answer to be returned will be a table of all the movies, the total revenue of each of them, and the revenue for each country.
I succeeded to do it in Turnilo - it generated for me the following Druid query -
[
[
{
"queryType": "timeseries",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"aggregations": [
{
"name": "__VALUE__",
"type": "doubleSum",
"fieldName": "revenue"
}
]
},
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "movie_id",
"outputName": "movie_id"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 50
}
],
[
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"filter": {
"type": "selector",
"dimension": "movie_id",
"value": "some_movie_id"
},
"dimension": {
"type": "default",
"dimension": "country",
"outputName": "country"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 5
}
]
]
But it doesn't work when I'm trying to use it as a body for a Postman query - I got
{
"error": "Unknown exception",
"errorMessage": "Unexpected token (START_ARRAY), expected VALUE_STRING: need JSON String that contains type id (for subtype of org.apache.druid.query.Query)\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 2, column: 3]",
"errorClass": "com.fasterxml.jackson.databind.exc.MismatchedInputException",
"host": null
}
How should I build the corresponding query so that it works with Postman?
I am not familiar with Turnilo but have you tried using the Druid Console to write SQL and convert to Native request with the "Explain SQL query" option under the "Run/..." menu?
Your native queries seem to be doing a Top N instead of listing all movies, so I think the SQL might be something like:
SELECT movie_id, country_id, SUM(revenue) total_revenue
FROM movies_source
WHERE __time BETWEEN '2021-11-18 00:01:00' AND '2021-11-21 00:01:00'
GROUP BY movie_id, country_id
ORDER BY total_revenue DESC
LIMIT 50
I don't have the data source to test, but tested with sample wikipedia data with similar query structure:
SELECT namespace, cityName, sum(sum_added) total
FROM "wikipedia" r
WHERE cityName IS NOT NULL
AND __time BETWEEN '2015-09-12 00:00:00' AND '2015-09-15 00:00:00'
GROUP BY namespace, cityName
ORDER BY total DESC
limit 50
which results in the following Native query:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "wikipedia"
},
"intervals": {
"type": "intervals",
"intervals": [
"2015-09-12T00:00:00.000Z/2015-09-15T00:00:00.001Z"
]
},
"virtualColumns": [],
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "cityName",
"value": null,
"extractionFn": null
}
},
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "namespace",
"outputName": "d0",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "cityName",
"outputName": "d1",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "longSum",
"name": "a0",
"fieldName": "sum_added",
"expression": null
}
],
"postAggregations": [],
"having": null,
"limitSpec": {
"type": "default",
"columns": [
{
"dimension": "a0",
"direction": "descending",
"dimensionOrder": {
"type": "numeric"
}
}
],
"limit": 50
},
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "cd5aabed-5e08-49b7-af63-fe82c125d3ee",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false
},
"descending": false
}
Thanks for your help,
I'm trying to transform the Array called 'Tags' hosting a list of pair of key-values into a list of columns: CustomerId, CustomerDisplayName, CustomerPath where each list contains a respective value being a store for each charge.
{
"periodFrom": "2020-11-09T00:00:00",
"periodTo": "2020-12-08T00:00:00",
"charges": [
{
"listPrice": 5.05,
"netPrice": 5.05,
"netPriceProrated": 5.05,
"subTotal": 5.05,
"currency": "CAD",
"isBilled": true,
"isProratable": true,
"deductions": [],
"fees": [],
"invoice": {
"number": "2822835",
"date": "2020-11-16T00:00:00",
"periodFrom": "2020-10-09T00:00:00",
"periodTo": "2020-11-08T00:00:00"
},
"taxes": [
{
"name": "GST",
"appliedRate": 5.0
},
{
"name": "QST",
"appliedRate": 9.975
}
],
"tags": [
{
"name": "CustomerId",
"value": "42c8edf4-365a-4068-bde6-33675832afbb"
},
{
"name": "CustomerDisplayName",
"value": "Blue Sky Group"
},
{
"name": "CustomerPath",
"value": "Devmesh Co-Branded/Blue Sky Group"
}
]
}
]
}
Here the actual snippet of code I'm actually
let
...
Response2 = Table.FromRecords( { Response } ),
#"Expand1" = Table.ExpandListColumn(Response2, "charges"),
#"Expand2" = Table.ExpandRecordColumn(#"Expand1", "charges", {"productId", "productName", "sku", "chargeId", "chargeName", "chargeType", "periodFrom", "periodTo", "quantity", "listPrice", "netPrice", "netPriceProrated", "subTotal", "currency", "isBilled", "isProratable", "deductions", "fees", "invoice", "taxes", "tags"}, {"charges.productId", "charges.productName", "charges.sku", "charges.chargeId", "charges.chargeName", "charges.chargeType", "charges.periodFrom", "charges.periodTo", "charges.quantity", "charges.listPrice", "charges.netPrice", "charges.netPriceProrated", "charges.subTotal", "charges.currency", "charges.isBilled", "charges.isProratable", "charges.deductions", "charges.fees", "charges.invoice", "charges.taxes", "charges.tags"})
in
#"Expand2"
I want to create an end point for s3 bucket in aws dms for migrating data from s3 to redshift. When defining table structure in JSON format, I am getting an error that character limit is set to 1000 characters. Is there a walk around for this or am I doing something wrong?
Json template shared on aws dms website also has more then 1000 characters. I am thinking how to have a walk around if table structure has more then 20 columns.
Also, if some one has created a dms task for sql server to s3 to redshift. I want to understand how update files which gets created in s3 bucket when you enable replication are loaded to redshift as an updated rather then new table or row.
Thank you in advance.
I tried removing space and eol characters
{
"TableCount": "1",
"Tables": [
{
"TableName": "employee",
"TablePath": "hr/employee/",
"TableOwner": "hr",
"TableColumns": [
{
"ColumnName": "Id",
"ColumnType": "INT8",
"ColumnNullable": "false",
"ColumnIsPk": "true"
},
{
"ColumnName": "LastName",
"ColumnType": "STRING",
"ColumnLength": "20"
},
{
"ColumnName": "FirstName",
"ColumnType": "STRING",
"ColumnLength": "30"
},
{
"ColumnName": "HireDate",
"ColumnType": "DATETIME"
},
{
"ColumnName": "OfficeLocation",
"ColumnType": "STRING",
"ColumnLength": "20"
}
],
"TableColumnsTotal": "5"
}
]
}
Error : Must be no longer than 1000 characters
On a unix system (EC2,..) use tr -s " ". When your data has tabs, first use expand -1.
echo '{
"TableCount": "1",
"Tables": [
{
"TableName": "employee",
"TablePath": "hr/employee/",
"TableOwner": "hr",
"TableColumns": [
{
"ColumnName": "Id",
"ColumnType": "INT8",
"ColumnNullable": "false",
"ColumnIsPk": "true"
},
{
"ColumnName": "LastName",
"ColumnType": "STRING",
"ColumnLength": "20"
},
{
"ColumnName": "FirstName",
"ColumnType": "STRING",
"ColumnLength": "30"
},
{
"ColumnName": "HireDate",
"ColumnType": "DATETIME"
},
{
"ColumnName": "OfficeLocation",
"ColumnType": "STRING",
"ColumnLength": "20"
}
],
"TableColumnsTotal": "5"
}
]
}' | tr -s " "