DAG level global static variable for all task instances - google-cloud-platform

I am trying to share a common variable with all the tasks e.g pipeline_id which i calculate using the current_system time.
Is there a way to pass this variable to all the tasks in DAG? currently i get different value in different tasks as they are running on different processes.
I have some logic to separate different pipeline runs.

In airflow, variables have a global scope which can be used for overall configuration.Variables are used for values that are runtime-dependent.
As you want to share common variables, you can try using Xcoms which can be used to pass data from one Task/Operator to another one. Xcoms(cross communications) are identified by key and the task_id or dag_id from where it comes from. These are per-task-instance and used for communication among dags. For more information, you can check this link.

I think i have found a better way. As airflow DAG execution starts it populates AIRFLOW_CTX_DAG_RUN_ID=manual__2022-07-08T16:30:02.549233+00:00
,
we can extract the timestamp part out of this variable and use this as a common global time for all of the tasks.
I don't need to use XCOM with this as the variable can be accessed inside each task.

Related

Difference between an Output & an Export

In CloudFormation we have the ability to output some values from a template so that they can be retrieved by other processes, stacks, etc. This is typically the name of something, maybe a URL or something generated during stack creation (deployment), etc.
We also have the ability to 'export' from a template. What is the difference between returning a value as an 'output' vs as an 'export'?
Regular output values can't be references from other stacks. They can be useful when you chain or nest your stacks and their scope/visibility is local. Exported outputs are visible globally within account and region, and can be used by any future stack you are going to deploy.
Chaining
When you chain your stacks, you deploy one stack, take it outputs, and use as input parameters to the second stack you are going to deploy.
For example, let's say you have two templates called instance.yaml and eip.yaml. The instance.yaml outputs its instance-id (no export), while eip.yaml takes instance id as an input parameter.
To deploy them both, you have to chain them:
Deploy instance.yaml and wait for its completion.
Note it outputs values (i.e. instance-id) - usually done programmatically, not manually.
Deploy eip.yaml and pass instance-id as its input parameter.
Nesting
When you nest stacks you will have a parent template and a child template. Child stack will be created from inside of the parent stack. In this case the child stack will produce some outputs (not exports) for the parent stack to use.
For example, lets use again instance.yaml and eip.yaml. But this time eip.yaml will be parent and instance.yaml will be child. Also eip.yaml does not take any input parameters, but instance.yaml outputs its instance-id (not export)
In this case, to deploy them you do the following:
Upload parrent template (eip.yaml) to s3
In eip.yaml create the child instance stack using AWS::CloudFormation::Stack and the s3 url from step 1.
This way eip.yaml will be able to access the instance-id from the outputs of the nested stack using GetAtt.
Cross-referencing
When you cross-reference stacks, you have one stack that exports it outputs so that they can be used by any other stack in the same region and account.
For example, lets use again instance.yaml and eip.yaml. instance.yaml is going to export its output (instance-id). To use the instance-id eip.yaml will have to use ImportValue in its template without the need for any input parameters or nested stacks.
In this case, to deploy them you do the following:
Deploy instance.yaml and wait till it completes.
Deploy eip.yaml which will import the instance-id.
Altough cross-referencing seems very useful, it has one major issue, which is that its very difficult to update or delete cross-referenced stacks:
After another stack imports an output value, you can't delete the stack that is exporting the output value or modify the exported output value. All of the imports must be removed before you can delete the exporting stack or modify the output value.
This is very problematic if you are starting your design and your templates can change often.
When to use which?
Use cross-references (exported values) when you have some global resources that are going to be shared among many stacks in a given region and account. Also they should not change often as they are difficult to modify. Common examples are: a global bucket for centralized logging location, a VPC.
Use nested stack (not exported outputs) when you have some common components that you often deploy, but each time they can be a bit different. Examples are: ALB, a bastion host instance, vpc interface endpoint.
Finally, chained stacks (not exported outputs) are useful for designing loosely-coupled templates, where you can mix and match templates based on new requirements.
Short answer from here, use export between stacks, and use output with nested stacks.
Export
To share information between stacks, export a stack's output values.
Other stacks that are in the same AWS account and region can import
the exported values.
Output
With nested stacks, you deploy and manage all resources from a single
stack. You can use outputs from one stack in the nested stack group as
inputs to another stack in the group. This differs from exporting
values.

How to define database variable for logging in Kettle?

I would like to know if there is a proper way to pass the database connections variable so they can be used in the logging sections of both jobs and transformations.
Regards,
Nicolas.
Edit the kettle.properties from the top menu.
If you want to do it for ALL the log use the variables like KETTLE_JOB_LOG_* and KETTLE_JOB_TRANS_*. There is no way to do it for job and transformation at the same time, but it is defining 8 variables (instead of 4) which can be copy/pasted.
If you want to do it for specific job and/or transformation, define your own variables like log_bd, log_table,... and use them as ${log_db}, ${log_table},... You have to define the parameters for each job and transformation. Or else, you could write a small program to change the xml of the .ktr and .kjb.

Can I create my own variable which should work like built in variable in Informatica

I want to create my own variable and want to set my default value which should work like a built in variable in Informatica. I want to use created variable in my all workflow.
Is it possible any way ?
Thanks
You can use the same parameter file across all of your mappings (there is syntax to separate out bits which are for specific sections and those which are universal by way of the scope) see following link https://network.informatica.com/thread/27560

initiate a variable dynamically to work in all session on workflow

I am developing an Informaticajob with multiple sessions in one workflow. I need to assign a variable ##AAR with following code
IIF(get_date_part(sysdate,'mm') <= 7, get_date_part(add_to_date(sysdate,'YY',-2),'YY'), get_date_part(add_to_date(sysdate,'YY',-1),'YY')
)
I am not sure how to get about it, I was thinking on creating a session that assigns the variable, then passes it to the workflow.
This session should be the first session to run in the workflow. but I don't know how to create a session that is not a mapping.
What could I do to get this done?
First, you define the variables in your workflow (Workflows -> Edit -> Variables) so the workflow knows about the variables.
Then, as first Task in your workflow, you take an "Assignment" instead of a session. That's the icon that looks like a calculator.
In the assignment, you can assign values to your variables.
Please note that the variables need to be names "$$..." not "##..."

How to detect when my Django object's DataTimeField reach current time

I'm using Django 1.5.5.
Say I have an object as such:
class Encounter(model.Models):
date = models.DateTimeField(blank=True, null=True)
How can I detect when a given Encounter has reached current time ? I don't see how signals can help me.
You can't detect it using just Django.
You need some scheduler, that will check every Encounters date (for example, by using corresponding filter query), and do needed actions.
It can be a simple cron script. You can write it as django custom management command. And cron must call it every 5 minute, for example.
Or, you can use Celery. With it, you can see worker status from admin and do some other things.
What you could do is use Celery. When you save an object of Encounter this would then get into the task queue and execute only once it has reached current time.
There is one caveat though, it might execute a bit later depending on how busy the celery workers are.