Regex to extract text within exact given function - regex

I am reading text from a .config file and then I have a long string where I need to extract a text which matches the below given pattern. .config file has 2 functions defined (input and filter)
This is the text extracted from the .config file
input {
name: "abc",
age: "20"
}
filter {
name: "pqr",
age: "25"
}
I need to extract only the text within the filter function including the filter text itself
expected output
filter {
name: "pqr",
age: "25"
}
Here I have written a regex where I can extract all the text within the { } parenthesis.
Created Regex
At the moment it extracts text within the overall file. Anyone can help me to update the regex to extract only the filter function with its name by updating the regex ( we need to consider both the intermediate space exist and non-exist behavior as well)
scenario 1 - space between filter text and the parenthesis
filter {
name: "pqr",
age: "25"
}
and
scenario 2 - no space between filter text and the parenthesis
filter{
name: "pqr",
age: "25"
}

You can use this regex which will match your filter block and also the space between filter and {...} is optional and will match with or without space.
^filter\s*\{[^{}]+\}$
Notice: I have enabled the m flag in the demo, so you will need to enable it in your programming language or use inline modifier before the regex like this (?m)^filter\s*\{[^{}]+\}$
Regex Explanation:
^filter - Starts matching the text with filter
\s* - Allows for matching optional whitespace
\{ - Matches literal {
[^{}]+ - Matches one or more any character except { or }
\}$ - Matches the closing } and marks end of input
Regex Demo

Related

How do I create a REGEX to get nested custom tags?

I want to match the texts that are between the ${ and }$ tags but my regex is stopping at the inner tag and leaving the outer tag out
That's my regex: /${[\W\w]+?}$/g
And that's the text
constructor(public props: ${ClassName}$Model) {
super()
${fields$:
this._#{key}# = this.initProp(this, new #{value:1}#(props?.#{key}#, '#{key}# da ${ClassName}$'))
}$
}
enter image description here
I would like to extract the tags
${Classname}$ and
${fields$:
this._#{key}# = this.initProp(this, new #{value:1}#(props?.#{key}#, '#{key}# da ${ClassName}$'))
}$
but my result is being
${Classname}$ and
${fields$:
this._#{key}# = this.initProp(this, new #{value:1}#(props?.#{key}#, '#{key}# da ${ClassName}$
can anybody help me?
You can match ${, then any char other than ${ and }$ or any text inside single quotes (double quote support can also be added) with escaped quotes support, and then }$:
\$\{(?:'[^\\']*(?:\\.[^\\']*)*'|(?!}\$|\$\{)[^'])*}\$
See the regex demo. Details:
\$\{ - ${ string
(?:'[^\\']*(?:\\.[^\\']*)*'|(?!}\$|\$\{)[^'])* - zero or more of
'[^\\']*(?:\\.[^\\']*)*' - a substring between single quotes with escaped quotes support
| - or
(?!}\$|\$\{)[^'] - any char other than ' that is not the starting point of the }$ or ${ char sequences
}\$ - a }$ string
Note: this still fail if the initial ${ is already inside quotes. It is still possible to work around that situation, but the final solution will depend on what your regex flavor and/or API can offer.

Matching pattern repeats for unknown times. How to replace each matched string?

I have this string
mark:: string1, string2, string3
I want it to be
mark:: xxstring1xx, xxstring2xx, xxstring3xx
The point is, I don't know how many times the matched string repeated. Sometimes there are 10 strings in the line, sometimes there is none. So far I have come up with this matching pattern mark:: ((.*)(, )+)*, but I'm unable to find a way to substitute individual matched string.
If possible I would like to have this output:
mark:: xxstring1xx
mark:: xxstring2xx
mark:: xxstring3xx
But if it's not possible it's fine to have the one-line solution
By using snippets you can make use of their ability to use conditionals.
IF you can select the line first, this is quite easy. Use this keybinding in your keybindings.json:
{
"key": "alt+w", // whatever keybinding you want
"command": "editor.action.insertSnippet",
"args": {
"snippet": "${TM_SELECTED_TEXT/(mark::\\s*)|([^,]+)(, )?/$1${2:+xx}$2${2:+xx}$3/g}"
}
}
The find is simple: (mark::\\s*)|([^,]+)(, )?
replace: $1${2:+xx}$2${2:+xx}$3
Capture group 1 followed by xx if there is a group 2 ${2:+xx} : conditional, followed by group 2, followed by another conditional.
Demo:
If you have a bunch of these lines in a file and you want to transform them all at once, then follow these steps:
In the Find widget, Find: (mark::\s*)(.*)$ with the regex option enabled.
Alt+Enter to select all matches.
Trigger your snippet keybinding from above.
Demo:
For your other version with separate lines for each entry, use this in the keybinding:
{
"key": "alt+w",
"command": "editor.action.insertSnippet",
"args": {
// single line version
// "snippet": "${TM_SELECTED_TEXT/(mark::\\s*)|([^,]+)(, )?/$1${2:+xx}$2${2:+xx}$3/g}"
// each on its own line
"snippet": "${TM_SELECTED_TEXT/(mark::\\s*)|([^,]+)(, )?/${2:+mark:: }${2:+xx}$2${2:+xx}${3:+\n}/g}"
}
}
You can use
(\G(?!\A)\s*,\s*|mark::\s*)([^\s,](?:[^,]*[^\s,])?)
And replace with $1xx$2xx.
See the regex demo. Details:
(\G(?!\A)\s*,\s*|mark::\s*) - Group 1 ($1):
\G(?!\A)\s*,\s* - end of the previous successful match and then a comma enclosed with zero or more whitespaces
| - or
mark::\s* - mark:: and zero or more whitespaces
([^\s,](?:[^,]*[^\s,])?) - Group 2 ($2):
[^\s,] - a char other than whitespace and comma
(?:[^,]*[^\s,])? - an optional sequence of zero or more non-commas and then a char other than a whitespace and a comma.
In Visual Studio Code file search and replace feature, you can use a Rust regex compliant regex:
(mark::(?:\s*(?:,\s*)?xx\w*xx)*\s*(?:,\s*)?)([^\s,](?:[^,]*[^\s,])?)
Replace with the same $1xx$2xx replacement pattern. Caveat: you need to hit the replace button as many times as there are matches.
See this regex demo showing the replacement stages.

Regex capture groups and use OR statement

I'm trying to create a regex expression that has has multiple conditions separated by | (OR). I want to use capture groups but I'm not getting it to work fully.
3 sample strings:
--- {source-charset: '', encoding-error-limit: '', class: stat-direct, directory: \\\myserver\C\FOLDER\SUB_FOLDER}
--- {odbc-connect-string-extras: '', server: hello.sample.com, dbname: X_DB, port: '80', class: hello, username: USERX}
--- {cleaning: 'no', filename: //myserver/D/FOLDER/SUB_FOLDER/File name.xlsx, dataRefreshTime: '', interpretationMode: '0'}
For each sample string I would like the regex to return:
\\\myserver\C\FOLDER\SUB_FOLDER
X_DB
//myserver/D/FOLDER/SUB_FOLDER/File name.xlsx
Basically the value after either directory:, dbname: or filename: and ending with } for one of them and , for two.
I've managed to use OR statements to get the three conditions in.
regex extract
'directory: [^}]+|dbname: [^,]+|filename: [^,]+'
That returns:
directory: \\\myserver\C\FOLDER\SUB_FOLDER}
dbname: X__DB,
filename: //myserver/D/FOLDER/SUB_FOLDER/File name.xlsx,
If I introduce capturing groups I only get the right return for one of the parts:
'directory: ([^}]+)|dbname: ([^,]+)|filename: ([^,]+)'
That returns:
\\\myserver\C\FOLDER\SUB_FOLDER
null
null
I've managed to get it working with a nested regex that takes the result from
'directory: [^}]+|dbname: [^,]+|filename: [^,]+'
and uses:
': ([^,}]+)'
That gives me the result I want but I would like to do this as one regex.
Any help would be greatly appreciated.
/Aron
You could use a negated character class to match not a {, } or a comma, match any of the options in a non capturing group and use and a single capturing group the capture the values:
{[^{]+(?:filename|directory|dbname): ([^,}]+)[^}]*}
Explanation
{ Match {
[^{]+ Match 1+ times not { using a negated character class
(?:filename|directory|dbname): Match any of the listed options followed by : and a space
( Capture group1
[^,}]+ Match 1+ times not , or }
) Close group 1
[^}]*} Match 0+ times not }, then match }
Regex demo

Botkit for Slack using regex patterns in conversations

I'm running into an issue using regex patterns in botkit conversations that I can't quite work through, and though I'm sure someone else will have a quick answer, I'm utterly stumped.
I'm building a conversation that will store user information in a JSON file, but I need a small amount of validation on the entries before I store them. Specifically, the inputs must either be a full name (any number of words greater than one with a space between them) or a domain username in the format of domain\name with no spaces and the correct domain.
Using RegExr, I came up with the following regEx expressions which match in that user interface but which will not match when placed in the "pattern" attribute of the botkit conversation node:
\w+( +\w+)+ for the any number of words with a space between them.
domain+(\\+\w+) for the specified domain + a username
But when I use these in the botkit conversation, they're not matching -- so I'm not quite clear on what I'm doing wrong.
Here's the code snippet in which these are being used:
bot.startConversation(message, function (err, convo) {
convo.say("I don't know who you are in TFS. Can you tell me?");
convo.addQuestion("You can say \"no\" or tell me your TFS name or your domain name (eg: \"domain\\username)", [
{
pattern: bot.utterances.no,
callback: function (response, convo) {
convo.say("Okay, maybe another time then.");
convo.next();
}
},
{
pattern: '\w+( +\w+)+',
callback: function (response, convo) {
convo.say("You said your TFS name is " + response.text);
convo.next();
}
},
{
pattern: 'domain+(\\+\w+)+',
callback: function (response, convo) {
convo.say("You said your TFS name is " + response.text);
convo.next();
}
},
{
default: true,
callback: function (response, convo) {
convo.say("I didn't understand that.");
convo.repeat();
convo.next();
}
}
], {}, 'default');
You need to use double backslashes, and fix the backslash before the closing single quote in the first regex string literal:
pattern: '\\w+( +\\w+)+',
pattern: 'domain(\\\\\\w+)+',
The first pattern:
\\w+ - 1+ word chars
( +\\w+)+ - 1 or more sequences of 1 or more spaces and then 1 or more word chars
Domain regex:
domain - a domain
(\\\\\\w+)+ - 1 or more occurrences of
\\\\ - 1 backslash
\\w+ - 1 or more word chars.

preg_match pattern with slashes stored in variable

I'm having trouble with this regex. (https://regex101.com/r/vQLlyY/1)
My pattern is working and is:
(?<=Property: )(.*?)(?= \(Contact)|(?<=Property: )(.*?)(?= - )
You'll see in the link that the property text is extracted in both these strings:
Property: This is the property (Contact - Warren)
Property: This is another property - Warren
In my code, this pattern is stored like this:
$this->rex["property"][2] = '/(?<=Property: )(.*?)(?= \(Contact)|(?<=Property: )(.*?)(?= - )/s'
Then, it is extracted like this:
foreach ($this->rex as $key => $value ) {
if (isset($value[$provider])) {
preg_match_all($value[$provider], $emailtext, $matches);
if (!empty($matches[1][0])) {
$emaildetails[$key] = trim(preg_replace("/\r|\n/", "", $matches[1][0]));
} else {
$emaildetails[$key] = "";
}
}
}
In this example, $provider = 2
My problem I'm sure is with the blackslash because I can't get this code to pickup the (Contact part of the pattern where I need to escape the bracket. I know the code works because I have many other patterns in use. Also, this works for the property text if the pattern is stored like this:
$this->rex["property"][2] = '/(?<=Property: )(.*?)(?= - )/s
So, am I storing the pattern correctly with the escaped bracket, or is that even my problem? Thanks in advance!
Because you're using separate capture groups, the different paths are ending up in different match indexes. For instance, the first line (the Contact - Warren one) is storing the match result in index 1, where the second line has an empty string in index 1 and the match result you're looking for in index 2.
To solve this issue, you can use non-capture groups or you can rewrite your expression to use positive lookaheads. The benefits of the former include allowing for quantifiers. The benefits of the latter include not having the entire match result end up in your 0 match index.
Example of non-capture group: (?<=Property: )(.*?)\s*(?:\(Contact|- ) https://regex101.com/r/vQLlyY/2.
Example of positive-lookahead: (?<=Property: )(.*?)(?= \(Contact| - ) https://regex101.com/r/vQLlyY/3.