Regular Expression to pull data between two nodes (including nodes) - regex

I am trying to use regular expressions to pull a specified line out. Below is how my data is formatted. My function will receive a node name (such as "cat" in this example). What would be the RegEx rule to pull the entire line (including the beginning and ending nodes)?
Data:
<Bat>Jim;Doug;<Bat>
<Cat>Jake;Dan;Bill;<Cat>
<Dog>Greg;Bob;Ashley;<Dog>
Desired Result:
<Cat>Jake;Dan;Bill;<Cat>

You will want to use this: (<Cat>).*\1
It will find any line that starts and ends with Cat.

Related

Can regex be used to find this pattern?

I need to parse a large amount of data in a log file, ideally I can do this by splitting the file into a list where each entry in the list is an individual entry in the log.
Every time a log entry is made it is prefixed with a string following this pattern:
"4404: 21:42:07.433 - After this point there could be anything (including new line characters and such). However, as soon as the prefix repeats that indicates a new log entry."
4404 Can be any number, but is always then followed by a :.
21:42:07.433 is the 21 hours 42 mins 7 seconds 433 milliseconds.
I don't know much about regex, but is it possible to identify this pattern using it?
I figured something like this would work...
"*: [0-24]:[0:60]:[0:60].[0-1000] - *"
However, it just throws an exception and I fear I'm not on the right track at all.
List<string> split_content = Regex.Matches(file_content, #"*: [0-24]:[0:60]:[0:60].[0-1000] - *").Cast<Match>().Select(m => m.Value).ToList();
The following expression would split a string according to your pattern:
\d+: \d{2}:\d{2}:\d{2}\.\d{3}
Add a ^ in the beginning if your delimiting string always starts a line (and use the m flag for regex). Capturing the log chunks with a regex would be more elaborate, I'd suggest just splitting (with Regex.Split) if you have your log content in the memory all at once.

nifi routeText processor usage issue

I am facing issue in configuring RouteText Processor correctly. I have to filter out those lines which have say a particular string values at a particular index. Let's say I want all the lines which have 'BT' or 'PV7' and 'PV30' values at index 19. My file is csv.
I tried using below configuration but all of my lines are moved to unmatched relation. However, data is containing other lines too.
You need to change the Matching Strategy to "Satisfies Expression" since you are not using regular expressions here.
The docs for Satisfies Expression says:
"Match lines based on whether or not the the text satisfies the given Expression Language expression. I.e., the line will match if the property value, evaluated as an Expression, returns true. The expression is able to reference FlowFile Attributes, as well as the variables 'line' (which is the text of the line to evaluate) and 'lineNo' (which is the line number being evaluated. This will be 1 for the first line, 2 for the second and so on)."

Vim regex matching multiple results on the same line

I'm working on a project where I'm converting an implementation of a binary tree to an AVL tree, so I have a few files that contain lines like:
Tree<int>* p = new Tree<int>(*t);
all over the place. The goal I have in mind is to use a vim regex to turn all instances of the string Tree into the string AVLTree, so the line above would become:
AVLTree<int>* p = new AVLTree<int>(*t);
the regex I tried was :%s/Tree/AVLTree/g, but the result was:
AVLTree<int>* p = new Tree<int>(*t);
I looks to me like when vim finds something to replace on a line it jumps to the next one, so is there a way to match multiple strings on the same line? I realize that this can be accomplished with multiple regex's, so my question is mostly academic.
Credit on this one goes to Marth for pointing this out. My issue was with vim's gdefault. By default it's set to 'off', which means you need the /g tag to make your search global, which is what I wanted. I think mine was set to 'on', which means without the tag the search is global, but with the tag the search is not. I found this chart from :help 'gdefault' helpful:
command 'gdefault' on 'gdefault' off
:s/// subst. all subst. one
:s///g subst. one subst. all
:s///gg subst. all subst. one

PowerShell isolating parts of strings

I have no experience with regular expressions and would love some help and suggestions on a possible solution to deleting parts of file names contained in a csv file.
Problem:
A list of exported file names contains a random unique identifier that I need isolated. The unique identifier has no predictable pattern, however the aspects which need removing do. Each file name ends with one of the following variations:
V, -V, or %20V followed by a random number sequence with possible spaces, additional "-","" and ending with .PDF
examples:
GTD-LVOE-43-0021 V10 0.PDF
GTD-LVOE-43-0021-V34-2.PDF
GTD-LVOE-43-0021_V02_9.PDF
GTD-LVOE-43-0021 V49.9.PDF
Solution:
My plan was to write a script to select of the first occurrence of a V from the end of the string and then delete it and everything to the right of it. Then the file names can be cleaned up by deleting any "-" or "_" and white space that occurs at the end of a string.
Question:
How can I do this with a regular expression and is my line of thinking even close to the right approach to solving this?
REGEX: [\s\-_]V.*?\.PDF
Might do the trick. You'd still need to replace away any leading - and _, but it should get you down the path, hopefully.
This would read as follows..
start with a whitespace, - OR _ followed by a V. Then take everything until you get to the first .PDF

Reading a text config file: using regex to parse

Looking for a way to read the following config file sample using a multi line regex matcher. I could just read in the file by line, but I want to get decent with the specifics of flexible regular expression matching.
So the config file is filled with blocks of code as follows:
blockName BLOCK
IDENTIFIER value
IDENTIFIER value
IDENTIFIER
"string literal value that
could span multiple lines"
The number of identifiers could be from 1..infinity. IDENTIFIER could be NAME, DESCRIPTION, TYPE, or the like.
I have never worked with multi line regular expressions before. I'm not very familiar with the process. I essentially want to use a findAll function using this regular expression to put all of the parsed block data into a data structure for processing.
EDIT: clarification: I'm only looking to read this file once. I do not care about efficiency or elegance. I want to read the information into a data structure and then spit it out in a different format. It is a large file (3000 lines) and I don't want to do this by hand.
I don't think regex is the best tool for this.
Try this, which should work in perl regular expressions:
([\w\d]*)\s+BLOCK\s*\n(\s*(NAME|DESCRIPTION|TYPE|...)\s*([\w\d]*|"(.*)")\s*\n)+
I verified it at REGex TESTER using the following test text:
blockName BLOCK
NAME value
NAME value
DESCRIPTION
"string literal value that
could span multiple lines"
otherName BLOCK
NAME value
TYPE value
DESCRIPTION
"string literal value that
could span multiple lines"
It will only find the last block/identifier if the file ends in a newline