Passing Input to Step Function from a manual approval step - amazon-web-services

I have a usecase where one of the task in Step Function is Manual Approval Step.
As a part of completion of this step, we want to pass some inputs which will be used by subsequent tasks.
This there a way to do it ?
I have seen passing JSON in output while completing the Manual Approval Step. Is there a way that we can read this output as input in next step ?
client.sendTaskSuccess(new SendTaskSuccessRequest()
.withOutput("{\"key\": \"this is value\"}")
.withTaskToken(getActivityTaskResult.getTaskToken()));

It is possible, but your question doesn't provide enough information for a specific answer. Some general tips about input/output processing:
By default, the output of a state becomes the input to the next state. You can use ResultPath to write the output of the the Task to a new field without replacing the entire JSON payload that becomes the input to the next state.
If subsequent states are using InputPath or Parameters, you might be filtering the input and removing the output of the approval step. Similarly with OutputPath.
The doc on Input and Output Processing may be helpful: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-input-output-filtering.html

Related

Parallel processes with dependency

There are two parallel processes. Each process has two steps. The second step of the first process is always executed after the first step. The second step of the second process is performed only under a certain condition.
Activity diagram:
How to reflect an additional condition: to complete the second step of the second process, the first step of the first process must be completed.
I managed:
Flaws:
No match between fork and join
If the condition of the second process is not met, the token “hangs” before join
Having looked at your solution once more made me think that you saw issues, where there are none. You are worried about the hanging token, but that is no issue in this case. If P22 is bypassed, the token from P11 will go down directly to the join node. P11 and P12 will pass their token down also with no issue, thereby creating that ghost token which gets stuck in the middle right join. Since the lower join now has two tokens it will continue to the end where the activity is terminated. At that point any free running tokens (and even active actions) are terminated as well. All good.
I leave my former answer for further inspiration. But basically they will all be implemented in similar ways since they represent a gateway.
Original answer
I guess that using an event would be the best way:
This way D can only start (and finish) when the event has been received which is sent after As completion.
Another way would be to use an object that stores the finalization of action A and which is read by D.
Note that the diagonal connectors through a ready are ObjectFlows which UML does not per default distinguish optically (unlike SysML).
P. 374 of UML 2.5 states
Object tokens pass over ObjectFlows, carrying data through an Activity via their values, or carrying no data ( null tokens). A null token can still be passed along an ObjectFlow and used like any other token. For example, an Action can output a null token to explicitly indicate that it did not produce an optional value, and a downstream DecisionNode (see sub clause 15.3) can test for this and branch accordingly.
So you can see that as a buffer holding a token and no real data is needed to be stored. Basically that's the same as an event. Implementation wise you would use a semaphore or a stream to realize that, but of course at this level you would not care too much about such details.

AWS Batch output to Step Functions

When using Step Functions, Lambdas can get input from a state and create output that can be used by the Step Functions to affect flow using a Choice state. However, while Batch jobs can also get input from Step Functions, I can't find information on how to get batch output back to Step Functions to be fed into a Choice state (as in JSON output, rather than simply the succeeded/failed state of the job).

PDI - How to keep Transformation run even an error occur?

I have a transformation with several steps that run by batch script using Windows Task Scheduler.
Sometimes the first step or the n steps fail and it stops the entire transformation.
I want to transformation to run from start to end regardless of any errors, any way of doing this?
1)One way is to “error handling”, however it is not available for all the steps. You can right click on the step and check whether error handling option is available or not.
2) if you are getting errors because of incorrect datatype, for example: you are expecting a integer value and for some specific record you may get string value so it may fail , for handling such situation you can use data validation step.
Basically you can implement logic based on the transformation you have created. Above are some of the General methods.
This is what you called "Error Handling". Though your transformation runs with some Errors, you still want your transformation to continue to run.
Situations:
- Data type issues in the data stream.
Ex: say you have a column X of data type integer but by mistake you got string value. then you can define Error handling to capture all these records.
- while Processing json data.
Ex: the path you mentioned to retrieve a value of json field and for some data node the path can't identify or missing it. you can define error handling to capture all missing path details.
- while Update table
- If you are updating a table with some key, and if the key was not available as it is coming from input stream then an error will occur. you can define error handling here also.

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

Dataflow pipeline waits for elements from all streams before performing GroupBy

We are running a Dataflow job that handles multiple input streams. Some of them are high traffic and some of them rarely get messages through. We are joining all streams with a "shared" stream that contains information relevant to all elements. This is a simplified example of the pipeline:
I noticed that the job will not produce any output, until both streams contain some traffic.
For example, let's suppose that Stream 1 gets a steady flow of traffic, whereas Stream 2 does not produce any messages for a period of time. For this time, the job's DAG will show elements being accumulated in the GroupByKey step but nothing will be propagated beyond it. I can also see the Flatten PCollections step showing input elements for the left side of the graph but not the right one. This creates a problem when dealing with high traffic and low traffic streams in the same job, since it will cause output to be delayed for as much as it takes for Stream 2 to pick up messages.
I am not sure if the observation is correct, but I wanted to ask if this is how Flatten/GroupByKey works in general and if so, if the issue we're seeing can be avoided through an alternative way of constructing the pipeline.
(Example JobID: 2017-02-10_06_48_01-14191266875301315728)
As described in the documentation of group-by-key the default behavior is to wait for all data within the window to have arrived -- this is necessary to ensure correctness of down-stream results.
Depending on what you are trying to do, you may be able to use triggers to cause the aggregates to be output earlier.
You may also be able to use the slow-stream as a side-input to the processing of the fast-stream.
If you're still stuck, it would help if you could describe in more detail the contents of the streams and how you're trying to use them, since more detailed answers depend on the goal.