When there is any change in DDL of any table, we have to again import source and target definition and change mapping. Is there a way to dynamically fetch the DDL of the table and do the data copy using Informatica mapping.
The ETL uses an abstractive layer, separated from any physical database. It uses Source and Target definition that indicate what should be expected to find in DB to which the job will be connecting. Keep in mind that the same data mapping can be applied to many different source and / or target systems. It's not bound to any of them, it just defines what data to fetch and what to do with them.
In Informatica this is reflected by separating Mappings, that define data flow, and Sessions, which indicate where the logic should be applied.
Imagine you're transferring data from multiple servers. A change applied on one of them should not break the whole data integration. If the changes would be dynamically reflected, then a column added on one server would make it impossible to read data from the others.
Of course if perfectly fine to have such requirement as you've mentioned. It's just not something Informatica supports with their approach.
The only way workaround is to create your own application that would fetch table definitions, generate the Workflows and import them into Informatica prior to execution.
Related
we are considering using AWS Neptune as graphdb solution.
I am coming from Django world so I used to use db migrations a lot.
I could not find any info about how AWS Neptune does change management on DB?
ie. what happens if I want to reload a backup from a month ago and there has been schema changes since then? How do we track these changes?
Should we write custom scripts?
Unlike something like an RDBMS and some other data stores, Amazon Neptune, and many other graph dbs for that matter, are called "schemaless" meaning there is no need to explicitly define or maintain a schema. The schema is implicitly defined by the data stored in the database. In the case you mentioned, restoring a backup, there is no need for a migration/change script to be run. When you restore the backup the schema will be defined by the restored data.
This "schemaless" nature of the database allows applications to begin adding new entity types and data properties without any sort of ETL process. However, this also means that the application does need to manage some sort of schema internally to maintain sanity over the data being stored (e.g. first_name and firstName could be used and would be separate properties.).
We would like to back up a SQL Server cluster at the DC site to another standalone SQL Server at the DR site. We would like to use SymmetricDS and we want all DB objects from the source to be mirrored to the DR (including new tables, triggers and stored procedures). Some tables do not have primary keys.
We would like to know the type of architecture best suited to our needs.
The configuration for SymmetricDS would be two nodes that sync with each other. You could use one node group and link them, like "primary pushes to primary". By using bi-directional, you can use your mirror database when needed, and it will capture changes to get the other one back in sync when it becomes available.
SymmetricDS will replicate tables and data, but it does not replicate triggers and stored procedures. Also, the table replication works for most common cases, but misses details like computed columns and defaults that call functions.
I have never used Informatica PowerCenter before and just don't know where to begin. To summarize my goal, I need to run a simple count query against a Teradata database using Informatica PowerCenter. This query needs to be ran on a specific day, but doesn't require me to store or manipulate the data returned. Looking at Informatica PowerCenter Designer is a bit daunting to me as I'm not sure what to be looking for.
Any help is greatly appreciated in understanding how to setup (if needed):
Sources
Targets
Transformations
Mappings
Is a transformation the only way to query data using PowerCenter? I've looked at a lot of tutorials, but most seem to be oriented to familiar users.
You can run a query against a database using informatica, only if you create a mapping, session and workflow to run that. But you cannot see the result unless you store it somewhere, either in a flatfile or a table.
Here are the steps to create it anyway.
Import your source table in source analyzer from Teradata.
Create a flat file target or import a relational target in target analyzer
Create a mapping m_xyz, drag and drop your source into the mapping.
You will see your source and source qualifier in the mapping. Write your custom query in source qualifier, say select count(*) as cnt from table
Remove all the ports from SQ except one numeric port from source to SQ and name it as cnt, count from your select will be assigned to this port.
Now drag and drop this port to an expression transformation.
Drag and drop your target into the mapping
Propagate the column from expression to this flat file/relational target.
Create a workflow and a session for this mapping.
In workflow you can schedule it to run on specific date.
When you execute this, count will be loaded into the column of that flat file or table.
I'm using kettle transformation to store CSV files data to database.My client requirement is to store the same CSV files into different databases(eg:Oracle and postgres) dynamically.How to achieve this ?I have tried with kettle Job and set variables method.It doesn't worked for me. How to pass the database connection properties dynamically to the transformation as parameter to connect to different databases.Please help me out for this issue.
To connect to different databases of the SAME type, you can just set the relevant properties (host, port, database name, schema, username, password and whatever the connection requires).
However, if your database types change, you need to set up a Generic Database connection, where you need to provide the class of the JDBC driver, the full connection URL (including parameters), the username and password.
By changing those variables you can switch your target database.
Bear in mind that a variable cannot be set and used in the same transformation. Due to the parallel nature of transformation steps you need to set the variable values in a Transformation 1 then use them in a Transformation B and enclose both transformations inside a parent job. Best variable scope is "valid within root job".
actually, for different databases using multiple shared.xml files in multiple KETTLE_HOME locations may work just fine, I didn't have time to test thoroughly but I do use KETTLE_HOME and shared.xml for one-off runs, the databases are the same though, at lease as per connection.type.
I am using Microsoft Synch Service Framework 4.0 for synching Sql server Database tables with SqlLite Database on the Ipad side.
Before making any Database schema changes in the Sql Server Database, We have to Deprovision the database tables. ALso after making the schema changes, we ReProvision the tables.
Now in this process, the tracking tables( i.e. the Synching information) gets deleted.
I want the tracking table information to be restored after Reprovisioning.
How can this be done? Is it possible to make DB changes without Deprovisioning.
e.g, the application is in Version 2.0, The synching is working fine. Now in the next version 3.0, i want to make some DB changes. SO, in the process of Deprovisioning-Provisioning, the tracking info. gets deleted. So all the tracking information from the previous version is lost. I do not want to loose the tracking info. How can i restore this tracking information from the previous version.
I believe we will have to write a custom code or trigger to store the tracking information before Deprovisioning. Could anyone suggest a suitable method OR provide some useful links regarding this issue.
the provisioning process should automatically populate the tracking table for you. you don't have to copy and reload them yourself.
now if you think the tracking table is where the framework stores what was previously synched, the answer is no.
the tracking table simply stores what was inserted/updated/deleted. it's used for change enumeration. the information on what was previously synched is stored in the scope_info table.
when you deprovision, you wipe out this sync metadata. when you synch, its like the two replicas has never synched before. thus you will encounter conflicts as the framework tries to apply rows that already exists on the destination.
you can find information here on how to "hack" the sync fx created objects to effect some types of schema changes.
Modifying Sync Framework Scope Definition – Part 1 – Introduction
Modifying Sync Framework Scope Definition – Part 2 – Workarounds
Modifying Sync Framework Scope Definition – Part 3 – Workarounds – Adding/Removing Columns
Modifying Sync Framework Scope Definition – Part 4 – Workarounds – Adding a Table to an existing scope
Lets say I have one table "User" that I want to synch.
A tracking table will be created "User_tracking" and some synch information will be present in it after synching.
WHen I make any DB changes, this Tracking table "User_tracking" will be deleted AND the tracking info. will be lost during the Deprovisioning- Provisioning process.
My workaround:
Before Deprovisioning, I will write a script to copy all the "User_tracking" data into another temporary table "User_tracking_1". so all the existing tracking info will be stored in "User_tracking_1". WHen I reprovision the table, a new trackin table "User_Tracking" will be created.
After Reprovisioning, I will copy the data from table "User_tracking_1" to "User_Tracking" and then delete the contents from table "User_Tracking_1".
UserTracking info will be restored.
Is this the right approach...