I am working and AWS SWF and wanted to create a workflow such than I can pass an ID during starting of execution and be able to access that in my decider and activity worker.
I am not able to find any documentation related to that.
I am implementing workers and deciders in python using BOTO library.
You actually cannot create a workflow execution without specifying a workflowId according to the documentation of start_workflow_execution method.
Related
In my current architecture, multiple dataflow jobs are triggered at various stages, as part of ABC framework, I need to capture the job id of those jobs as audit metrics inside the dataflow pipeline and update it in BigQuery.
How do I get the run id of dataflow job from the pipeline using JAVA?
Is there any existing method that I can use for that or do I need to use google cloud's client library inside the pipeline for that?
If you are submitting to dataflow, I believe this might work:
DataflowPipelineJob result = (DataflowPipelineJob)pipeline.run()
result.getJobId()
But you cannot access that within the pipeline itself afaik (DoFns etc).
The best way to ensure you know your job id/name, is to set it yourself. You can do this by setting --jobName and this is accessible via options.getJobName(), dataflow will use this. Note it must be unique.
I have an airflow instance with many tenants that have DAGs. They want to extract metadata on their dagruns like DagRun.end_date. However I want to restrict each tenant so they can only access data related to their own dagruns and be unable to access data of other people's dagruns. How can this be done?
This is what I imagine the DAG to look like
# custom macro function
def get_last_dag_run(dag):
last_dag_run = dag.get_last_dagrun()
return last_dag_run.end_date
I found these resources which explain how to extract data but not how to restrict it.
Getting the date of the most recent successful DAG execution
Apache airflow macro to get last dag run execution time
How to get last two successful execution dates of Airflow job?
how to get latest execution time of a dag run in airflow
How to find the start date and end date of a particular task in dag in airflow?
How to get dag status like running or success or failure
NB: I am a contributor to Airflow.
This is not possible with the current Airflow architecture.
We are slowly working to make Airflow multi-tenant capable, but for now we are half-way through and it will be several major releases to get there I believe.
Currently the only way to isolate tenants is to give every tenant separate Airflow instance, which is not as bad as you might initially think. If you run them in separate namespaces on the same auto-scaling Kubernetes cluster and add KEDA autoscaling, and use same database server (but give each tenant a separate schema), this might be rather efficient (especially if you use Terraform to setup/teardown such Airflow instances for example).
Looking for dynamically creating cron jobs that gets created and configured using the request parameters send by the Cloud Functions or normal HTTP request.
There is already manual way by visiting the Google Cloud console but I actually make this manual task by configuring and creating jobs according to request parameters.
I am already aware that we can provide a cron.yaml file that can have all the configuration but I need some help or any reference that contains detail way to achieve this.
I am also beginner so indeed correct me or provide any alternate solution.
You'll want to use the Cloud Scheduler API. Specifically, this is a REST API that lets you do everything you could do via the console or the gcloud command.
How to modify programmatically existing scheduled task with API 2.0?
For example to add a trigger or an action.
Thanks.
I'm implementing a workflow with AmazonSWF and one of my activities comes in the form of a lambda function.
Both SWF and Lambda are being run on the London region, where they both work separately. However, my decider after polling for the task, it fails with the cause "LAMBDA_SERVICE_NOT_AVAILABLE_IN_REGION"
I haven't explicitly specified which region I'm working from in code, I assumed it would be the same one that I run the SWF web client in.
Here's the relevant code in my decider:
val attrs = ScheduleLambdaFunctionDecisionAttributes()
.withId("S3ControlWorkflowFunction")
.withName("S3ControlWorkflowFunction")
decisions.add(
Decision()
.withDecisionType(DecisionType.ScheduleLambdaFunction)
.withScheduleLambdaFunctionDecisionAttributes(attrs)
)
My activity worker doesn't do anything at all for the lambda function, but it shouldn't have to right?
I've registered the workflow with my IAM role here:
wf.registerWorkflowType(RegisterWorkflowTypeRequest()
.withDomain(DOMAIN)
.withName(WORKFLOW)
.withVersion(WORKFLOW_VERSION)
.withDefaultChildPolicy(ChildPolicy.TERMINATE)
.withDefaultTaskList(TaskList().withName(TASKLIST))
.withDefaultTaskStartToCloseTimeout("30")
.withDefaultLambdaRole(iamARN.id))
Found the fix.
Turns out calling lambda functions from SWF just isn't supported on region eu-west-2, as well as a few others. However I can't find any reference to this at all in the documentation. Found this forum post which gave me the hint. Migrating all the work I'd done over to eu-west-1 solved the issue. Poor show from Amazon here