How to define database variable for logging in Kettle? - kettle

I would like to know if there is a proper way to pass the database connections variable so they can be used in the logging sections of both jobs and transformations.
Regards,
Nicolas.

Edit the kettle.properties from the top menu.
If you want to do it for ALL the log use the variables like KETTLE_JOB_LOG_* and KETTLE_JOB_TRANS_*. There is no way to do it for job and transformation at the same time, but it is defining 8 variables (instead of 4) which can be copy/pasted.
If you want to do it for specific job and/or transformation, define your own variables like log_bd, log_table,... and use them as ${log_db}, ${log_table},... You have to define the parameters for each job and transformation. Or else, you could write a small program to change the xml of the .ktr and .kjb.

Related

DAG level global static variable for all task instances

I am trying to share a common variable with all the tasks e.g pipeline_id which i calculate using the current_system time.
Is there a way to pass this variable to all the tasks in DAG? currently i get different value in different tasks as they are running on different processes.
I have some logic to separate different pipeline runs.
In airflow, variables have a global scope which can be used for overall configuration.Variables are used for values that are runtime-dependent.
As you want to share common variables, you can try using Xcoms which can be used to pass data from one Task/Operator to another one. Xcoms(cross communications) are identified by key and the task_id or dag_id from where it comes from. These are per-task-instance and used for communication among dags. For more information, you can check this link.
I think i have found a better way. As airflow DAG execution starts it populates AIRFLOW_CTX_DAG_RUN_ID=manual__2022-07-08T16:30:02.549233+00:00
,
we can extract the timestamp part out of this variable and use this as a common global time for all of the tasks.
I don't need to use XCOM with this as the variable can be accessed inside each task.

Can I create my own variable which should work like built in variable in Informatica

I want to create my own variable and want to set my default value which should work like a built in variable in Informatica. I want to use created variable in my all workflow.
Is it possible any way ?
Thanks
You can use the same parameter file across all of your mappings (there is syntax to separate out bits which are for specific sections and those which are universal by way of the scope) see following link https://network.informatica.com/thread/27560

initiate a variable dynamically to work in all session on workflow

I am developing an Informaticajob with multiple sessions in one workflow. I need to assign a variable ##AAR with following code
IIF(get_date_part(sysdate,'mm') <= 7, get_date_part(add_to_date(sysdate,'YY',-2),'YY'), get_date_part(add_to_date(sysdate,'YY',-1),'YY')
)
I am not sure how to get about it, I was thinking on creating a session that assigns the variable, then passes it to the workflow.
This session should be the first session to run in the workflow. but I don't know how to create a session that is not a mapping.
What could I do to get this done?
First, you define the variables in your workflow (Workflows -> Edit -> Variables) so the workflow knows about the variables.
Then, as first Task in your workflow, you take an "Assignment" instead of a session. That's the icon that looks like a calculator.
In the assignment, you can assign values to your variables.
Please note that the variables need to be names "$$..." not "##..."

Tensorboard logging non-tensor (numpy) information (AUC)

I would like to record in tensorboard some per-run information calculated by some python-blackbox function.
Specifically, I'm envisioning using sklearn.metrics.auc after having run sess.run().
If "auc" was actually a tensor node, life would be simple. However, the setup is more like:
stuff=sess.run()
auc=auc(stuff)
If there is a more tensorflow-onic way of doing this I am interested in that. My current setup involves creating separate train&test graphs.
If there is a way to complete the task as stated above, I am interested in that as well.
You can make a custom summary with your own data using this code:
tf.Summary(value=[tf.Summary.Value(tag="auc", simple_value=auc)]))
Then you can add that summary to the summary writer yourself. (Don't forget to add a step).

How to pass parameters to scheduled task via cfschedule?

Is there any way to pass parameters or share data with a scheduled task? I understand that you can pass serializable arguments to a Quartz Job, but this seems not to be available in cfschedule. What are the options to achieve this?
The easiest way to do that is just to have a .cfm file that is called by cfschedule that itself calls the CFC and passes the desired methods.
If you want a more flexible solution, I have a Scheduler.cfc that allows you to have a method called at an frequency that you want and you have even pass arguments for the method call.
http://www.bryantwebconsulting.com/blog/index.cfm/2009/2/26/Schedulercfc-10
It can be gotten here.
https://github.com/sebtools/com.sebtools/
The important thing with it is that you have to have Scheduler instantiated into Application scope and a .cfm that is called by cfschedule that runs:
If you just have one method with arguments that needs to be called frequently, then Scheduler.cfc is overkill over the simple solution, but if this is a general problem that you need to solve more frequently, then it can pay off nicely.
You could pass them on the query string of the URL attribute.
example.com/index.cfm?param1=value1&param2=param2
If your data is complex you can always serialize it to JSON before and use deserializeJSON on the receiving task.