nifi routeText processor usage issue - regex

I am facing issue in configuring RouteText Processor correctly. I have to filter out those lines which have say a particular string values at a particular index. Let's say I want all the lines which have 'BT' or 'PV7' and 'PV30' values at index 19. My file is csv.
I tried using below configuration but all of my lines are moved to unmatched relation. However, data is containing other lines too.

You need to change the Matching Strategy to "Satisfies Expression" since you are not using regular expressions here.
The docs for Satisfies Expression says:
"Match lines based on whether or not the the text satisfies the given Expression Language expression. I.e., the line will match if the property value, evaluated as an Expression, returns true. The expression is able to reference FlowFile Attributes, as well as the variables 'line' (which is the text of the line to evaluate) and 'lineNo' (which is the line number being evaluated. This will be 1 for the first line, 2 for the second and so on)."

Related

RegEx to get unique values from large file with duplicates

I have a large XML-file that I want to extract unique values from. The values I'm looking for are placed in the XML-tag: ns3:order_id
To make it more complex, the file contains duplicates of order_id, and I'm only interested in geeting the unique order_id values.
I've been using RegEx to extract the values, this is the expression:
(?sm)(\<ns3:order_id>\d+\b)(?!.*\1\b)
The expression gives me what I need, BUT only if the file is way smaller. When I try this expression on the "big" file I receive: "Catastrophic backtracking has been detected and the execution of your expression has been halted." I guess it has with *, and I have tried different ways replacing it without success.
Is there any way to correct my expression so that I can collect the values?
As seen in the text above, I've tried several diffrent RegEx ways. The expression above works, but not in bigger files.

jmeter extract multiple regular expression

I'm looking to extract 2 expressions from the following response:
"FirstValue":"1234","Someotherfield":"****","Someotherfield":"****",(Some other more fields),"SecondValue":"6789"
Now explained deeply: there is my first value - which is followed by many other values, and eventually my second value. Note: the number of fields between is defined and determined, but I don't use the following solution:How to extract multiple values with a regular expression in Jmeter because I think it will be too long regular expression (about 20 back slashes).
I've come up with the following 2 solutions:
Reference name: Parameters
1."FirstValue":"(.+?)"(.+?)"SecondValue":"(.+?)"
2."FirstValue":"(.+?)"*.*"SecondValue":"(.+?)"
Which work fine. But, I want to make it more efficient, since in the response I get also the value between my requested values (e.g. Parameters_g0="FirstValue":"1234","Someotherfield":"****","Someotherfield":"****",(Some other more fields),"SecondValue":"6789").
So question is, is there a more efficient way to use? If no, which one is preferred between what I raised in this post?
Thank you

How to format a WinMerge fllter to ignore part of the line

I would like WinMerge to compare the full text but exclude a variable substring.
Orientation="West" PhysicalAddress="2395226" DefFieldFrmt="Uf4d0" UnitCustomText="sec"
Orientation="West" PhysicalAddress="2395230" DefFieldFrmt="Uf4d1" UnitCustomText="sec"
In the lines above I want to ignore the PhysicalAddress="xxx" and locate the changed DefFieldFrmt="Uf4d1"
I have tried adding the filter:
PhysicalAddress=".*"
However this filters the complete line.
The actual text before and after the PhysicalAddress="xxx" will vary so I need a filter that says: match prefix and match suffix but ignore target variable substring.
Help please.
According to the documentation, is not possible to use the line filters for this:
When a rule matches any part of the line, the entire difference is ignored. Therefore, you cannot filter just part of a line.
However, since WinMerge's source code is on GitHub, it is possible to add a feature request for this to its list of issues.

Regular Expression to pull data between two nodes (including nodes)

I am trying to use regular expressions to pull a specified line out. Below is how my data is formatted. My function will receive a node name (such as "cat" in this example). What would be the RegEx rule to pull the entire line (including the beginning and ending nodes)?
Data:
<Bat>Jim;Doug;<Bat>
<Cat>Jake;Dan;Bill;<Cat>
<Dog>Greg;Bob;Ashley;<Dog>
Desired Result:
<Cat>Jake;Dan;Bill;<Cat>
You will want to use this: (<Cat>).*\1
It will find any line that starts and ends with Cat.

Reading a text config file: using regex to parse

Looking for a way to read the following config file sample using a multi line regex matcher. I could just read in the file by line, but I want to get decent with the specifics of flexible regular expression matching.
So the config file is filled with blocks of code as follows:
blockName BLOCK
IDENTIFIER value
IDENTIFIER value
IDENTIFIER
"string literal value that
could span multiple lines"
The number of identifiers could be from 1..infinity. IDENTIFIER could be NAME, DESCRIPTION, TYPE, or the like.
I have never worked with multi line regular expressions before. I'm not very familiar with the process. I essentially want to use a findAll function using this regular expression to put all of the parsed block data into a data structure for processing.
EDIT: clarification: I'm only looking to read this file once. I do not care about efficiency or elegance. I want to read the information into a data structure and then spit it out in a different format. It is a large file (3000 lines) and I don't want to do this by hand.
I don't think regex is the best tool for this.
Try this, which should work in perl regular expressions:
([\w\d]*)\s+BLOCK\s*\n(\s*(NAME|DESCRIPTION|TYPE|...)\s*([\w\d]*|"(.*)")\s*\n)+
I verified it at REGex TESTER using the following test text:
blockName BLOCK
NAME value
NAME value
DESCRIPTION
"string literal value that
could span multiple lines"
otherName BLOCK
NAME value
TYPE value
DESCRIPTION
"string literal value that
could span multiple lines"
It will only find the last block/identifier if the file ends in a newline