Kettle or PDI: share same DB connection between different steps

Kettle or PDI: share same DB connection between different steps - kettle

I've multiple DB connections in a Kettle (Pentaho Data Integration) trasformation.
There are some lookup steps and a table output steps.
They must use the same DB connection.
I'm searching a way to change the DB connection in a step and automatically change the connection also in the other steps.
Is it possible?
I know that i can use variable readed from parameters inside the connection definition, but i'm searching a sort of GUI way.

First implementation use jdbc connection pool. Set pool size same as number of steps in transformation demand connections to DB.
Another case to define separate connection for each step. It is much easier if you use jndi preconfigured in $KETTLE_HOME/simple-jndi/jdbc.properties
For example on image underneath two variables point on master and slave connection. This will allow you to choose appropriate connection.
How to configure jndi in Pentahoo Kettle
http://wiki.pentaho.com/display/EAI/.03+Database+Connections

Related

Strategy for Asynchronous database access with Qt5 SQL

I need to create a server in Qt C++ with QTcpServer which can handle so many requests at the same time. nearly more than 1000 connections and all these connection will constantly need to use database which is MariaDB.
Before it can be deployed on main servers, It needs be able to handle 1000 connections with each connection Querying data as fast it can on 4 core 1 Ghz CPU with 2GB RAM Ubuntu virtual machine running on cloud. MySQL database is hosted on some other server which more powerful
So how can I implement this ? after googling around, I've come up with following options
1. Create A new QThread for each SQL Query
2. Use QThreadPool for new SQL Query
For the fist one, it might will create so many Threads and it might slow down system cause of so many context switches.
For second one,after pool becomes full, Other connections have to wait while MariaDB is doing its work. So what is the best strategy ?

Sorry for bad english.
1) Exclude.
2) Exclude.
3) Here first always doing work qt. Yes, connections (tasks for connections) have to wait for available threads, but you easy can add 10000 tasks to qt threadpool. If you want, configure max number of threads in pool, timeouts for tasks and other. Ofcourse your must sync shared data of different threads with semaphore/futex/mutex and/or atomics.
Mysql (maria) it's server, and this server can accept many connections same time. This behaviour equally what you want for your qt application. And mysql it's just backend with data for your application.
So your application it's server. For simple, you must listen socket for new connections and save this clients connections to vector/array and work with each client connection. Always when you need something (get data from mysql backend for client (yeah, with new, separated for each client, onced lazy connection to mysql), read/write data from/to client, close connection, etc.) - you create new task and add this task to threadpool.
This is very simple explanation but hope i'm helped you.

Consider for my.cnf [mysqld] section
thread_handling=pool-of-threads
Good luck.

Auto failover multiple connections to mirror database when principal goes down

I have a principal database (server_A), mirror database (server_B), and a witness database (server_C). The databases are set up for automatic failover, that is, when server_A goes down or fails over, server_B assumes the role of the new principal database. The database quorum is set up correctly to the best of my knowledge.
I have written an application in c++ to connect to the database and get a value to ensure a true connection. The application detects when a failure occurs on the GetValue call and attempts to reconnect when the error occurs.
The issue is this:
When I have MULTIPLE connections to the database (two threads connected, once connected, it will get a value in a loop), when the failover occurs (stopping sql server on server A so server B will take over as principal), I detect the connection failure and destroy my connection and attempt to reconnect using the same connection string:
"Driver={SQL Native Client};Server=tcp:Server_A;Failover_Partner=tcp:Server_B;Database=SomeDatabase;Uid=SomeUser;Pwd=SomePassword;"
** NOTE **
I have verified that the failover has taken place by monitoring the databases.
Even though, the connection to the database has been properly disposed of, I cannot reconnect to the database until I restart the application, OR if I bring server_A back online (now acting as the mirror database) and then failover server_B (shutting down sql server) making server A the principal database again, the application can reconnect without having to completely close out.
Though I could manipulate the connection string to make server_B the new principal and server_A the new Failover_Partner, this is not an ideal solution as many more connections will be utilized.
Keep in mind, this ONLY happens with multiple connections to the database. If I run the application with only one connection, all is fine and I can reconnect just fine when the failover occurs.
EDIT: If I connect in the beginning with multiple threads, all is fine. When I shutdown SQL Server, and therefore a failover occurs, I can reconnect only when I go through and delete ALL objects and re-instantiate new objects. Also, I am using SQL Native Client 11.0 (ODBC). Thoughts?

A lot of what you're describing is consistent with the issue described in KB 2605597 "Time-out error when a mirrored database connection is created by the .NET Framework data provider for SQLClient."
The KB describes problems when the connection timeout is set to 15 seconds, I have anecdotally heard of similar problems when the connection timeout is set to 0 (which isn't a good idea for other reasons, mentioning just in case).
This hotfix is applied to the application servers. If you want to rule this out as a possible cause, you could test raising the timeout (like it says in the workaround sections of the post) to make sure it's not the issue.
Later thought: The other thing I notice that is unusual is that you're specifying the TCP protocol in the connection string and the failover partner name. It's not clear to me from the documentation that it's supported in the failover partner name. You might want to try removing that and specifying the network attribute instead. (Recommended here.)
I do understand that you believe the issue isn't these things due to the single / multiple connections issue you've tested out.
However, I think you're better off simplifying the connection string so it's as consistent as possible with the published examples and making sure it's not the issues that people have commonly hit with this first. (The retry issue happens when there is latency, which can make it somewhat sporadic.)

Ok I have found the answer.
I had to modify the hosts file because my application did not reside in the same domain as the databases. Therefore when trying to fail over, I could not reach the database with the instance name (which is what the failover partner was cached as). I changed the hosts file to resolve the instance name to the ip address of the machine and it all works now.

Application connection to database

I have an application interacts with Access database using DAO class, recently I converted the database to a sqlite database.
I do not know which connection method is better for the design as following:
Create only one database connection using a public variable when open the application, any queries use the only connection object for interaction during the run time, the connection is then closed when close the application
Create database connection every time before running a query, then close the database connection instantly after loading the resultset to the memory.

I recommend that you encapsulate your db access, so that the decision on whether to keep a persistent connection or not open can be changed at a later point.
Since you are using SqlLite I am assuming that it is a single user DB, so concurrency , connection contention, locking etc. are not likely to be issues.
Typically the main reasons to reuse short running connections is usually on a multi user web or service oriented system, where scalability and licensing considerations are important. This doesn't seem to be applicable in your case.
.
In short, there doesn't seem any reason not to keep a connection open for the entire duration of your app / user's login session based on the above assumptions.
If you use transactions however, I would suggest that you commit these after each successful atomic activity

You know your two options have + and -. For your special case I think to create database connection every time is not so bad idea, because creating connection to sqlite is very fast and no time consuming. Also this way you may create/close more than one connection at once, which is a good benefit, maybe you don't do it now, but in the future maybe you will have to.

Rest API that needs a connection

I have a system where the user needs to connect to first and then based on the connection fetch some data. For e.g. you connect to a database and then fetch say metadata about a table say.
I was planning to expose this via REST API. So in this case, you need to first connect and then use that connection to fetch the metadata.
Two options come to my mind:
a. Have a url say /connect where you post the connection parameters to and it returns a conneciton id. This id is then encoded in subsequent URL to identify the connection.
b. Second option is to post the connection parameters everytime.
What are the pros/cons of these approaches? Are there any other alternatives?
One constraint is that the authentication mechanism to connect to the system is not in my control, I am just exposing some data from the systems via webservices and I am exploring using REST.

Do you really need to expose the connection?
I think it may just be semantic prejudice - but usually connection details are hidden by the service.
Does the connection have business value?!
If the connection does have business value, then treat it like a resource:
i.e.
do a post on /connections to return a new connection
then do a get on /connection//metadata to get the metadata about that connection.

c++ Mysql C API Connection Question

I'm building an application which uses Mysql, I was wondering what would be the best way to manage the connection to the actual Mysql server?
I'm still in the design phase, but currently I have it Connecting (or aborting if error) before every query and disconnecting after which is just for testing as right now I'm only running 1 query to see if the code I've setup so far works.
My App might be performing a few queries every 5/10/20/30 minutes depending on settings and doesn't really need to do anything with SQL until that time.
So I'm wondering if its more beneficial to use a continuous connection that exists for the lifetime of the application (if possible) or to simply connect to sql before I intend to use it, do what the app needs to do then disconnect?

Connecting once and performing many queries will naturally be more efficient.
However, if performance isn't a major concern for your project, maybe aiming for simplicity in your code might be a better option (especially if you are the only connection to the database).
If you want to get clever, then maybe connect as and when you need to, then keep the connection alive until you stop making queries. Eg, drop the connection if there have been no queries for 30 seconds or something like that.

How many instances of this app will be connecting to MySQL? If it's just one, keeping a MySQL connection open for convenience shouldn't cause any problems, but remember there's a (configurable) limit to the number of MySQL connections you can have open to the server. In this case, I would recommend opening a connection, running whatever queries you need to run, and then closing it. Connecting per query adds more overhead as you add queries to your application.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js