I want to create an audit, which can be re-used across multiple mappings to capture Source record count and target record count in when source database is oracle and target database is sql server
We are using it from source to staging mappings
It's all there, in Metadata tables. There's no need to add anything that will make your loads longer and more complex.
You can review this Framework for some ideas.
Related
I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls
I have never used Informatica PowerCenter before and just don't know where to begin. To summarize my goal, I need to run a simple count query against a Teradata database using Informatica PowerCenter. This query needs to be ran on a specific day, but doesn't require me to store or manipulate the data returned. Looking at Informatica PowerCenter Designer is a bit daunting to me as I'm not sure what to be looking for.
Any help is greatly appreciated in understanding how to setup (if needed):
Sources
Targets
Transformations
Mappings
Is a transformation the only way to query data using PowerCenter? I've looked at a lot of tutorials, but most seem to be oriented to familiar users.
You can run a query against a database using informatica, only if you create a mapping, session and workflow to run that. But you cannot see the result unless you store it somewhere, either in a flatfile or a table.
Here are the steps to create it anyway.
Import your source table in source analyzer from Teradata.
Create a flat file target or import a relational target in target analyzer
Create a mapping m_xyz, drag and drop your source into the mapping.
You will see your source and source qualifier in the mapping. Write your custom query in source qualifier, say select count(*) as cnt from table
Remove all the ports from SQ except one numeric port from source to SQ and name it as cnt, count from your select will be assigned to this port.
Now drag and drop this port to an expression transformation.
Drag and drop your target into the mapping
Propagate the column from expression to this flat file/relational target.
Create a workflow and a session for this mapping.
In workflow you can schedule it to run on specific date.
When you execute this, count will be loaded into the column of that flat file or table.
I am new in informatica cloud. I have list of queries ready in my table. Like below.
Now I want to take one by one query from this table which work as a source query and whatever results return which I need to load into target. All tables were already created in source and target.
I just need to copy the data based on dynamic queries which kept in my one of sql tables.
If anyone have any idea then please share your toughs with me. It great helps to me.
The source connection will be the connector to your source database and the Source Type will be query. From there it depends how you are managing your variables. See thread on Informatica Network for links to multiple examples.
Read the table like normally you would do in the cloud. Then pass each of the record into the sql transformation for execution. configure where the sql transformation has to execute and it will run the queries in the database you want.
you can use a SQL task to run dynamic SQL queries.
link to using SQL task approach: https://www.datastackpros.com/2019/12/informatica-cloud-incremental-load_14.html
In AWS DMS how does the migration happening internally? Is it like exporting entire data from source table and importing to destination table? Or is it like migrating table records one by one to destination table? I am new to aws dms and don't have much idea on how things work there.
AWS publish how DMS works in their documentation and blog posts. This is the list I wish I had when I started with DMS:
For a high level understanding see: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.html
A task can consist of three major phases:
The full load of existing data
The application of cached changes
Ongoing replication
During a full load migration, where existing data from the source is moved to the target, AWS DMS loads data from tables on the source data store to tables on the target data store. While the full load is in progress, any changes made to the tables being loaded are cached on the replication server; these are the cached changes.
...
When the full load for a given table is complete, AWS DMS immediately begins to apply the cached changes for that table. When all tables have been loaded, AWS DMS begins to collect changes as transactions for the ongoing replication phase. After AWS DMS applies all cached changes, tables are transactionally consistent. At this point, AWS DMS moves to the ongoing replication phase, applying changes as transactions.
From: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Components.html
Look at the headings:
Replication Tasks
Ongoing replication, or change data capture (CDC)
To gain a detailed understanding of how DMS works internally, read through the following blogs from AWS:
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 1)
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 2)
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong? (Part 3)
Finally, work through the blogs particular to your source and target databases at https://aws.amazon.com/blogs/database/category/migration/aws-database-migration-service-migration/
When I first used DMS I had same question. So simply I enabled Cloudwatch logs and created one migration task from Oracle to Aurora Postgresql.
First DMS task runs on Replication Instance and it connects to Source and Target databases.
RI then connect to Source database and based on selection rule it identifies tables and column details since it has lot of special access on Source and Target DB.
After that it start reading source table(s) in parallel and create Select col1, col2, col3.. from kind of query to fetch data from Source.
Then it write files in a temp location on RI based on tables, 1 file per table and approx 10000 rows in one commit.
While all this is happening another process is creating connection to Target DB and checking if Tables already exist if yes then it check which option we selected Do Nothing or Truncate Table etc.. Based on that it takes action.
Till now we have data from Source table in files on RI and connection and tables created on Target DB. Now RI just reads file records from RI temp location and create insert query.
Once last commit is successful it deletes the temp file from RI.
Once Source table and target table count is matched it closes connections in case of One time load.
In case of On going changes it keeps connection alive and read redo logs or other logs in Source db. Then follow same process mentioned above for CDC.
Here's a doc that provides some more information on how DMS Ongoing Replication works internally: https://aws.amazon.com/blogs/database/introducing-ongoing-replication-from-amazon-rds-for-sql-server-using-aws-database-migration-service/
The short of it is:
(following some initial steps) AWS DMS does not use any replication artifacts. When all the required information is available in the transaction log or transaction log backup, AWS DMS uses the fn_dblog() and fn_dump_dblog() functions to read changes directly from the transaction logs or transaction log backups using the log sequence number (LSN).
In addition to above answers, DMS uses Attunity underneath. There are public documents on how the later works in detail.
When there is any change in DDL of any table, we have to again import source and target definition and change mapping. Is there a way to dynamically fetch the DDL of the table and do the data copy using Informatica mapping.
The ETL uses an abstractive layer, separated from any physical database. It uses Source and Target definition that indicate what should be expected to find in DB to which the job will be connecting. Keep in mind that the same data mapping can be applied to many different source and / or target systems. It's not bound to any of them, it just defines what data to fetch and what to do with them.
In Informatica this is reflected by separating Mappings, that define data flow, and Sessions, which indicate where the logic should be applied.
Imagine you're transferring data from multiple servers. A change applied on one of them should not break the whole data integration. If the changes would be dynamically reflected, then a column added on one server would make it impossible to read data from the others.
Of course if perfectly fine to have such requirement as you've mentioned. It's just not something Informatica supports with their approach.
The only way workaround is to create your own application that would fetch table definitions, generate the Workflows and import them into Informatica prior to execution.