I have been trying to create a step function with a choice step that acts as a rule engine. I would like to compare a date variable (from the stale input JSON) to another date variable that I generate with a lambda function.
AWS documentation does not go into details about the Timestamp comparator functions, but I assumed that it can handle two input variables. Here is the relevant part of the code:
{
"Variable": "$.staleInputVariable",
"TimestampEquals": "$.generatedTimestampUsingLambda"
}
Here is the error that I am getting when trying to update(!!!) the stepFunction. I would like to highlight the fact that I don't even get to invoking the stepFunction as it fails while updating the function.
Resource handler returned message: "Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: String does not match RFC3339 timestamp at ..... (Service: AWSStepFunctions; Status Code: 400; Error Code: InvalidDefinition; Request ID: 97df9775-7d2d-4dd2-929b-470c8s741eaf; Proxy: null)" (RequestToken: 030aa97d-35a5-a6a5-0ac5-5698a8662bc2, HandlerErrorCode: InvalidRequest)
The stepfunction updates without the Timestamp matching, therefore, I suspect this bit of code.. Any guess?
EDIT (08.Jun.2021):
A comparison – Two fields that specify an input variable to compare,
the type of comparison, and the value to compare the variable to.
Choice Rules support comparison between two variables. Within a Choice
Rule, the value of Variable can be compared with another value from
the state input by appending Path to name of supported comparison
operators.
Source: AWS docs
It clearly states that two variables can be compared, but to no avail. Still trying :)
When I explained the problem to one of my peers, I realised that the AWS documentation mentions a Path postfix (which I confused with the $.). This Path needs to be added to the operatorName.
The following code works:
{
"Variable": "$.staleInputVariable",
"TimestampEqualsPath": "$.generatedTimestampUsingLambda"
}
Again, I would like to draw your attention to the "Path" word. That makes the magic!
Looks like you indeed found a way around your initial challenge from the thread linked below.
Using an amazon step function, how do I write a choice operator that references current time?
However, I thought you wanted to compare $.staleInputVariable to the current timestamp and I wince to think you had to configure a lambda function (and test it!) to do only that.
If so, you could have achieved that simply by using the Context Object or $$.:
{
"Variable": "$$.State.EnteredTime",
"TimestampEqualsPath": "$.staleInputVariable"
}
Related
I'm having an issue when upgrading from Terraform 0.13 to the latest version, as in Terraform 0.13, there were no quotes around variable outputs.
In 0.13, the following code:
output "bucket-name-output" {
value = "${aws_s3_bucket.logos-bucket.id}"
}
produced this value:
logo-horizon-bucket
In the latest version of Terraform (v 1.3.7), the same code gives this value:
"logo-horizon-bucket"
So you can see that quotes are now being added around the string.
This is then causing my Golang application code, which is trying to upload an object to an S3 bucket, to fail as it's setting the name of the bucket to the output value (which is surrounded by the quotes):
bucketName := terraform.Output(t, terraformOptions, "bucket-name-output")
params := &s3manager.UploadInput{
Bucket: aws.String(bucketName),
Key: aws.String("imageFileName"),
Body: getImage(),
}
I get the following error as S3 bucket names cannot contain quotes.
InvalidBucketName: The specified bucket is not valid.
status code: 400, request id:
So my question is how to do I remove those quotes from inside the Terraform file? I could add some logic to remove them in the Golang application code, but this is pretty messy so I would like to avoid this if possible...
Thanks!
I assume that something in your system is running terraform output bucket-name-output to get the value of this output value.
The terraform output command was always intended primarily for human consumption and at some point since v0.13 the human-oriented output changed to show values using a syntax similar to how they would appear inside the Terraform language itself, because it helps the reader to understand what type of value they have exported.
There are two different ways to tell terraform output that it should produce data in a form suitable for consumption by external software rather than by humans:
terraform output -json bucket-name-output produces a JSON description of the output value. This option will work for values of any type, by following the same encoding decisions as Terraform's jsonencode function.
terraform output -raw bucket-name-output produces just a raw string version of the output value. Technically what it is doing is converting the output value to a string with the same meaning as the tostring function and then printing the resulting string. This means it only works with output values of types that tostring can convert: strings, numbers, and boolean values.
Since your program was originally trying to use the human-readable output as if it was a raw string value, I think the second of these options would be the easiest for you to retrofit into your system. Your program already seems to be expecting a raw string version of the output value and the -raw option designed for exactly that purpose.
I am using Jira Cloud for Sheets Adds on in order to get Days in Status field from Jira, it seems to have the following syntax, from this post
<STATUS_ID>_*:*_<NUMBER_OF_TIMES_ISSUE_WAS_IN_THIS_STATUS>_*:*_<SECONDS>_*|
Here is an example:
10060_*:*_1_*:*_1121033406_*|*_3_*:*_1_*:*_7409_*|*_10000_*:*_1_*:*_270003163_*|*_10088_*:*_1_*:*_2595005_*|*_10087_*:*_1_*:*_1126144_*|*_10001_*:*_1_*:*_0
I am trying to extract for example how many times the issue was In QA status and the duration on a given status. I am dealing with parsing this pattern for obtaining this information and return it using an ARRAYFORMULA. Days in Status field information will be provided only when the issue was completed (is in Done status), otherwise, no information will be provided. if the issue is in Done status, but it didn't transition for a given status, this information will not be provided in the Days in Status string.
I am trying to use REGEXEXTRACT function to match a pattern for example:
=REGEXEXTRACT(C2, "(10060)_\*:\*_\d+_\*:\*_\d+_\*|")
and it returns an empty value, where I expect 10068. I brought my attention that when I use REGEXMATCH function it returns TRUE:
=REGEXMATCH(C2, "(10060)_\*:\*_\d+_\*:\*_\d+_\*|")
so the syntax is not clear. Google refers as a reference for Regular Expression to the following documentation. It seems to be an issue with the vertical bar |, per this documentation it is a special character that should be represented like this \v, but this doesn't work. The REGEXMATCH returns FALSE. I am trying to use some online RegEx tester, that implements Google Sheets syntax (RE2), I found ReGo, that I don't know if it is a valid one.
I was trying to use SPLITfunction like this:
=query(SPLIT(C2, "_*:*_"), "SELECT Col1")
but it seems to be a more complicated approach for getting all the values I need from Days in Status field string, but it separates well all the values from the previous pattern. In this case, I am getting the first Status ID. The number of columns returned by SPLITwill varies because it depends on the number of statuses the issues transitioned in order to get to DONE status.
It seems to be a complex task given all the issues I have encounter, but maybe some of you were dealing with this before and may advise about some ideas. It requires properly parsing the information and then extracting the information on specific columns using ARRAYFORMULA function when it applies for a given status from Status column.
Here is a google spreadsheet sample with the input information. I would like to populate the information for the following columns for Times In QA (C column) and Duration in QA (D column, the information is provided in seconds I would need in days but this is a minor task) for In QA status, then the same would apply for the rest of the other statuses. I added the tab Settings for mapping the Status ID to my Status, I would need to use a lookup function for matching the Status column in the Jira Issues tab. I would like to have a solution, without adding helper columns maybe it will require some script.
https://docs.google.com/spreadsheets/d/1ys6oiel1aJkQR9nfxWJsmEyd7XiNkVB-omcNL0ohckY/edit?usp=sharing
try:
=INDEX(IFERROR(1/(1/QUERY(1*IFNA(REGEXEXTRACT(C2:C, "10087.{5}(\d+).{5}(\d+)")),
"select Col1,Col2/86400 label Col2/86400''"))))
...so after we do REGEXEXTRACT some rows (which cannot be extracted from) will output as #N/A error so we wrap it into IFNA to remove those errors. then we multiply it by *1 to convert everything into numeric numbers (regex works & outputs always only plain text format). then we use QUERY to convert 2nd column into proper seconds in one go. at this point every row has some value so to get rid of zeros for rows we don't need (like row 2,3,5,8,9,etc) and keep the output numeric, we use IFERROR(1/(1/ wrapping. and finally, we use INDEX or ARRAYFORMULA to process our array.
Often execution_ids will show strictly numeric execution ids in the log: 1009612003154395
Other times, execution ids are alphanumeric like: zjxjkn9mp4p9
Why are these changing execution id types chosen? Are they as arbitrary as they seem? Can I infer anything from them?
An execution ID is just a string that uniquely identifies a single invocation of a function. That's all it means. The contents of that string are meaningless, but you can be sure it will be unique for all invocations of a particular type of function.
One documented use of it (the only one I could find) is for view logs coming from that one invocation. This makes it easier to track how a function executed, without having to sort through a bunch of log lines from other functions. See the documentation for logging:
You can even view the logs for a specific execution:
gcloud functions logs read FUNCTION_NAME --execution-id EXECUTION_ID
I have a transformation with several steps that run by batch script using Windows Task Scheduler.
Sometimes the first step or the n steps fail and it stops the entire transformation.
I want to transformation to run from start to end regardless of any errors, any way of doing this?
1)One way is to “error handling”, however it is not available for all the steps. You can right click on the step and check whether error handling option is available or not.
2) if you are getting errors because of incorrect datatype, for example: you are expecting a integer value and for some specific record you may get string value so it may fail , for handling such situation you can use data validation step.
Basically you can implement logic based on the transformation you have created. Above are some of the General methods.
This is what you called "Error Handling". Though your transformation runs with some Errors, you still want your transformation to continue to run.
Situations:
- Data type issues in the data stream.
Ex: say you have a column X of data type integer but by mistake you got string value. then you can define Error handling to capture all these records.
- while Processing json data.
Ex: the path you mentioned to retrieve a value of json field and for some data node the path can't identify or missing it. you can define error handling to capture all missing path details.
- while Update table
- If you are updating a table with some key, and if the key was not available as it is coming from input stream then an error will occur. you can define error handling here also.
I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.