Solution required to create Regex pattern - regex

I am developing a windows application in C#. I have been searching for the solution to my problem in creating a Regex pattern. I want to create a Regex pattern matching the either of the following strings:
XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
The specific requirement is that I need all the values from the above given string which is a collection of key/value pairs. Also, would like to know the right approach (in terms of efficiency and performance) out of the two...Regex pattern matching or substring, for the above problem.
Thank you all in advance and let me know, if more details are required.

I don't know C#, so there probably is a better way to build a key/value array. I constructed a regex and handed it to RegexBuddy which generated the following code snippet:
StringCollection keyList = new StringCollection();
StringCollection valueList = new StringCollection();
StringCollection unitList = new StringCollection();
try {
Regex regexObj = new Regex(
#"(?<key>\b\w+) # Match an alphanumeric identifier
\s*=\s* # Match a = (optionally surrounded by whitespace)
\( # Match a (
\s* # Match optional whitespace
(?<value>[^()]+) # Match the value string (anything except parens)
\) # Match a )
(?<unit>[^\s=]+ # Match an optional unit (anything except = or space)
\b # which must end at a word boundary
(?!\s*=) # and not be an identifier (i. e. followed by =)
)? # and is optional, as mentioned.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
keyList.Add(matchResult.Groups["key"].Value);
valueList.Add(matchResult.Groups["value"].Value);
unitList.Add(matchResult.Groups["unit"].Value);
matchResult = matchResult.NextMatch();
}

Regex re=new Regex(#"(\w+)\=\(([\d\s\.]+)\)");
MatchCollection m=re.Matches(s);
m[0].Groups will have { XD=(111111), XD, 111111 }
m[1].Groups will have { XT=( 588.466), XT, 588.466 }

String[] rows = { "XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr",
"XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr" };
foreach (String s in rows) {
MatchCollection Pair = Regex.Matches(s, #"
(\S+) # Match all non-whitespace before the = and store it in group 1
= # Match the =
(\([^)]+\S+) # Match the part in brackets and following non-whitespace after the = and store it in group 2
", RegexOptions.IgnorePatternWhitespace);
foreach (Match item in Pair) {
Console.WriteLine(item.Groups[1] + " => " + item.Groups[2]);
}
Console.WriteLine();
}
Console.ReadLine();
If you want to extract the units also then use this regex
#"(\S+)=(\([^)]+(\S+))
I added a set of brackets around it, then you will find the unit in item.Groups[3]

Related

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!
If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.
We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

Regex - get list comma separated allow spaces before / after the comma

I try to extract an images array/list from a commit message:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
I want to get a list that contains:
["image-a", "image-b", "image_c", "imaged", "image-e"]
NOTES:
A) should allow a single space before/after the comma (,)
B) ensure that #images = exists but exclude it from the group
C) I also searching for other parameters like #build and #setup so I need to ignore them when looking for #images
What I have until now is:
/(?i)#images\s?=\s?<HERE IS THE MISSING LOGIC>/
I use find() method:
def matcher = commitMsg =~ /(?i)#images\s?=\s?([^,]+)/
if(matcher.find()){
println(matcher[0][1])
}
You can use
(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)
See the regex demo. Details:
(?i) - case insensitive mode on
(?:\G(?!^)\s?,\s?|#images\s?=\s?) - either the end of the previous regex match and a comma enclosed with single optional whitespaces on both ends, or #images string and a = char enclosed with single optional whitespaces on both ends
(\w+(?:-\w+)*) - Group 1: one or more word chars followed with zero or more repetitions of - and one or more word chars.
See a Groovy demo:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
def re = /(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)/
def res = (commitMsg =~ re).collect { it[1] }
print(res)
Output:
[image-a, image-b, image_c, imaged, image-e]
An alternative Groovy code:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
def re = /(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)/
def matcher = (commitMsg =~ re).collect()
for(m in matcher) {
println(m[1])
}

Regex negative lookahead is not working properly

I have the following javascript code
var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....
To find the closest string declaration to c, I was using something like this:
var \w+ = '(.*?)';(?:(?!var \w+ = '(.*?)';.*?var c = \w+\.map)).*?var c = \w+\.map
I am using negative lookahead to not match a declaration if there is another declaration ahead that matches. However, the result I am getting is still <some string 2>. Could someone please explain me what causes that to happen?
The are more ways to match a string in JavaScript, and even more ways to break the pattern as this is very very brittle.
For the example data in the question, using a negative lookahead to assert that the next line does not start with the .map part or a string:
^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b
^ Start of string
var\s+\w\s*=\s* Match a possible var format and the equals sign
(["'`]) Capture group 1, match one of " ' or `
(.*?) Capture group 2 capture the data that you want to find
\1;\n A back reference to match up the quote in group 1
(?: Non capture group, to repeat as a whole
(?! Negative lookahead, assert what is directly to the right is not
var\s+\w+\s*=\s* Match a var part with =
(?: Non capture group for the alternation |
\w+\.map\b Match the .map part
| Or
(["'`]).*?\3 Capture group 3 to match one of the quotes with a back reference to match it up
) Close non capture group
;\n Match ; and a newline
) Close lookahead
.*\n Match the whole line and a newline
)* Close non capture group and optionally repeat to match all lines
var\s+\w+\s*=\s*\w\.map\b Match the line with .map
Regex demo
const regex = /^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b/m;
const str = `var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....`;
const m = str.match(regex);
if (m) {
console.log(m[2]);
}

Grab first 4 characters of two words RegEx

I would like to grab the first 4 characters of two words using RegEx. I have some RegEx experinece however a search did not yeild any results.
So if I have Awesome Sauce I would like the end result to be AwesSauc
Use the Replace Text action with the following parameters:
Pattern: \W*\b(\p{L}{1,4})\w*\W*
Replacement text: $1
See the regex demo.
Pattern details:
\W* - 0+ non-word chars (trim from the left)
\b - a leading word boundary
(\p{L}{1,4}) - Group 1 (later referred to via $1 backreference) matching any 1 to 4 letters (incl. Unicode ones)
\w* - any 0+ word chars (to match the rest of the word)
\W* - 0+ non-word chars (trim from the right)
I think this RegEx should do the job
string pattern = #"\b\w{4}";
var text = "The quick brown fox jumps over the lazy dog";
Regex regex = new Regex(pattern);
var match = regex.Match(text);
while (match.Captures.Count != 0)
{
foreach (var capture in match.Captures)
{
Console.WriteLine(capture);
}
match = match.NextMatch();
}
// outputs:
// quic
// brow
// jump
// over
// lazy
Alternatively you could use patterns like:
\b\w{1,4} => The, quic, brow, fox, jump, over, the, lazy, dog
\b[\w|\d]{1,4} => would also match digits
Update:
added a full example for C# and modified the pattern slightly. Also added some alternative patterns.
one approach with Linq
var res = new string(input.Split().SelectMany((x => x.Where((y, i) => i < 4))).ToArray());
Try this expression
\b[a-zA-Z0-9]{1,4}
Using regex would in fact be more complex and totally unnecessary for this case. Just do it as either of the below.
var sentence = "Awesome Sau";
// With LINQ
var linqWay = string.Join("", sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries).Select(x => x.Substring(0, Math.Min(4,x.Length))).ToArray());
// Without LINQ
var oldWay = new StringBuilder();
string[] words = sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries);
foreach(var word in words) {
oldWay.Append(word.Substring(0, Math.Min(4, word.Length)));
}
Edit:
Updated code based on #Dai's comment. Math.Min check borrowed as is from his suggestion.

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.