how to pass database connection properties dynamically to connect to different database in pentaho - kettle

I'm using kettle transformation to store CSV files data to database.My client requirement is to store the same CSV files into different databases(eg:Oracle and postgres) dynamically.How to achieve this ?I have tried with kettle Job and set variables method.It doesn't worked for me. How to pass the database connection properties dynamically to the transformation as parameter to connect to different databases.Please help me out for this issue.

To connect to different databases of the SAME type, you can just set the relevant properties (host, port, database name, schema, username, password and whatever the connection requires).
However, if your database types change, you need to set up a Generic Database connection, where you need to provide the class of the JDBC driver, the full connection URL (including parameters), the username and password.
By changing those variables you can switch your target database.
Bear in mind that a variable cannot be set and used in the same transformation. Due to the parallel nature of transformation steps you need to set the variable values in a Transformation 1 then use them in a Transformation B and enclose both transformations inside a parent job. Best variable scope is "valid within root job".

actually, for different databases using multiple shared.xml files in multiple KETTLE_HOME locations may work just fine, I didn't have time to test thoroughly but I do use KETTLE_HOME and shared.xml for one-off runs, the databases are the same though, at lease as per connection.type.

Related

AWS Neptune Change Management

we are considering using AWS Neptune as graphdb solution.
I am coming from Django world so I used to use db migrations a lot.
I could not find any info about how AWS Neptune does change management on DB?
ie. what happens if I want to reload a backup from a month ago and there has been schema changes since then? How do we track these changes?
Should we write custom scripts?
Unlike something like an RDBMS and some other data stores, Amazon Neptune, and many other graph dbs for that matter, are called "schemaless" meaning there is no need to explicitly define or maintain a schema. The schema is implicitly defined by the data stored in the database. In the case you mentioned, restoring a backup, there is no need for a migration/change script to be run. When you restore the backup the schema will be defined by the restored data.
This "schemaless" nature of the database allows applications to begin adding new entity types and data properties without any sort of ETL process. However, this also means that the application does need to manage some sort of schema internally to maintain sanity over the data being stored (e.g. first_name and firstName could be used and would be separate properties.).

How to capture session level details in a table

I have a requirement to capture session level details like session start time, end time, src success row, failedrows etc.. in a audit table. As all those details are available in prebuilt session variables i need to store these in a table. As of now what i am doing is, taking an assignment task in a workflow and assigning all these prebuilt session variables values for a particular session to wrkflow variables and passing these workflow variables to mapping variables using another non reusable session (the mapping which loads the table) using pre variable assignment option.It is working fine for workflow which is having one session. But if i have to implement this for a workflow having more no of sessions this process will be tedious as i have to create assignment task for each of these sessions and need to create non resuable session which calls a mapping to load into audit table.
So i am wondering is there any alternative solution to get this job done? I am thinking of a solution in which if we can able to captures audit details of all session in a file and pass this file as a input to a mapping to load this data at once into table. Is this possible? any solution?
Check this out: ETL Operational framework
It covers and end-to-end solution that should fit your needs and be quite easy to extend if you have multiple sessions - all you'd need to do is apply similar Post session commands before running the final session that loads the stats to database.

Use parameter as web service input for Informatica mapping

I have a WCF web service that takes a start and end date as input, and returns a record set. What I'd like to do is setup an Informatica mapping that creates variables for the date from one week ago and today's date. These are used as input for the web service consumer or web service as a source (whichever will work), but I'm not sure how to go about this. I can't create an Expression with no inputs, and I don't see how to set a mapped parameter as input.
The only two ways I can think about doing this would be to either build an app that creates a flat file with both dates, or to build a database object that supplies the dates as a source. I'd rather not have a separate outside source to provide these values, but I can't think of another way.
If you need those variables set before mapping run, use Assignment Task in the workflow and usePre-session variable assignment` to set the values for the mapping before it runs.
There is no way to do this with the Informatica v9.6.1. A source has to be created in order to feed the web service. I ended up creating a dummy record with 1 field, using it as input, then disregarding the input and setting up variable output using an Expression transformation.

Dynamic Mapping changes

When there is any change in DDL of any table, we have to again import source and target definition and change mapping. Is there a way to dynamically fetch the DDL of the table and do the data copy using Informatica mapping.
The ETL uses an abstractive layer, separated from any physical database. It uses Source and Target definition that indicate what should be expected to find in DB to which the job will be connecting. Keep in mind that the same data mapping can be applied to many different source and / or target systems. It's not bound to any of them, it just defines what data to fetch and what to do with them.
In Informatica this is reflected by separating Mappings, that define data flow, and Sessions, which indicate where the logic should be applied.
Imagine you're transferring data from multiple servers. A change applied on one of them should not break the whole data integration. If the changes would be dynamically reflected, then a column added on one server would make it impossible to read data from the others.
Of course if perfectly fine to have such requirement as you've mentioned. It's just not something Informatica supports with their approach.
The only way workaround is to create your own application that would fetch table definitions, generate the Workflows and import them into Informatica prior to execution.

Coldfusion: Move data from one datasource to another

I need to move a series of tables from one datasource to another. Our hosting company doesn't give shared passwords amongst the databases so I can't write a SQL script to handle it.
The best option is just writing a little coldfusion scripty that takes care of it.
Ordinarily I would do something like:
SELECT * INTO database.table FROM database.table
The only problem with this is that cfquery's don't allow you to use two datasources in the same query.
I don't think I could use a QoQ's either because you can't tell it to use the second datasource, but to have a dbType of 'Query'.
Can anyone think of any intelligent ways of getting this done? Or is the only option to just loop over each line in the first query adding them individually to the second?
My problem with that is that it will take much longer. We have a lot of tables to move.
Ok, so you don't have a shared password between the databases, but you do seem to have the passwords for each individual database (since you have datasources set up). So, can you create a linked server definition from database 1 to database 2? User credentials can be saved against the linked server, so they don't have to be the same as the source DB. Once that's set up, you can definitely move data between the two DBs.
We use this all the time to sync data from our live database into our test environment. I can provide more specific SQL if this would work for you.
You CAN access two databases, but not two datasources in the same query.
I wrote something a few years ago called "DataSynch" for just this sort of thing.
http://www.bryantwebconsulting.com/blog/index.cfm/2006/9/20/database_synchronization
Everything you need for this to work is included in my free "com.sebtools" package:
http://sebtools.riaforge.org/
I haven't actually used this in a few years, but I can't think of any reason why it wouldn't still work.
Henry - why do any of this? Why not just use SQL manager to move over the selected tables usign the "import data" function? (right click on your dB and choose "import" - then use the native client and permissions for the "other" database to specify the tables. Your SQL manager will need to have access to both DBs, but the db servers themselves do not need access to each other. Your manager studio will serve as a conduit.