Regex and LINQ extract group by group name - regex

I have a SQL sintax in the form of a string in which there are some parameters written in a standard way "<parameter1>, <parameter2>".
Then i have another string with the parameters values written in a standard way as well: "Parameter1=123; Parameter2=aaa".
I need to match the parameters in the first SQL with the values in the second one.
What I have so far:`
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim vmp As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "([\w ]+)=([\w ]+)")
Dim vmc As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches(BodySQL, "(?<=\<).+?(?=\>)")
For Each vm As RegularExpressions.Match In vmc
Dim Vl As String = (From m As RegularExpressions.Match In vmp
Where m.Groups(1).Value.Trim = vm.Value.ToString).Select(Of String)(Function(f) f.Groups(2).Value).ElementAt(0).Trim
BodySQL = BodySQL.Replace(vm.Value, Vl)
Next
It works for the first parameter, but then i get
"System.ArgumentOutOfRangeException: 'Specified argument was out of the range of valid values.
Parameter name: index'"
Can I please ask why?

You can extract the keys and values with one regex from the param=value strings, create a dictionary out of them, and then use Regex.Replace to replace the matches with the dictionary values:
Imports System.Text.RegularExpressions
' ...
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)(StringComparer.InvariantCultureIgnoreCase)
' (StringComparer.InvariantCultureIgnoreCase) makes the dictionary keys case insensitive
For Each match As Match In Regex.Matches("PARAMETER1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*([^;]*[^;\s])")
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, "<([^<>]*)>",
Function(match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
Output:
Blablabal WHERE X=2555 AND Y=12/02/2021
The (\S+)\s*=\s*([^;]*[^;\s]) pattern matches
(\S+) - captures into Group 1 any one or more non-whitespace chars (the key value)
\s*=\s* - a = char enclosed with zero or more whitespace chars
([^;]*[^;\s]) - Group 2: any zero or more chars other than ; and then one char other than ; and whitespace (the value, it cannot be empty with this pattern. If you want it to be possible to match empty values, you will need to remove [^;\s] and then use Trim() on the match.Groups(2).Value in the code.)
The <([^<>]*)> regex matches
< - a < char (do not escape this char in any regex flavor please, it is never a special regex metachar)
([^<>]*) - Group 1: any zero or more chars other than < and >
> - a literal > char.
Since the key is in Group 1 and < and > on both ends are consumed, the < and > are removed when replacing with the found value.
zero or more chars other than > and < between < and >.

The error is self explanatory. You are trying to access an array or List and specifying an index value that is either negative or larger than the largest index available.
.ElementAt(0) / m.Groups(1) / f.Groups(2)
My guess is that one of them might go out of bounds. Try to debug it with a breakpoint and check the values of your variables.

This is what i did with your code:
Dim vmc = "\<(.*?)\>"
i changed this regex so that it could also give me the "<>"
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)
For Each match As Match In Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*(\S[^;]+)")
i changed the regex expression to exclude the ";"
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, vmc,
Function(match As Match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
And now i have what i needed. Thank you a lot :)
Output:
WHERE X = 2555,Y = 12/02/2021

Related

DART Conditional find and replace using Regex

I have a string that sometimes contains a certain substring at the end and sometimes does not. When the string is present I want to update its value. When it is absent I want to add it at the end of the existing string.
For example:
int _newCount = 7;
_myString = 'The count is: COUNT=1;'
_myString2 = 'The count is: '
_rRuleString.replaceAllMapped(RegExp('COUNT=(.*?)\;'), (match) {
//if there is a match (like in _myString) update the count to value of _newCount
//if there is no match (like in _myString2) add COUNT=1; to the string
}
I have tried using a return of:
return "${match.group(1).isEmpty ? _myString + ;COUNT=1;' : 'COUNT=$_newCount;'}";
But it is not working.
Note that replaceAllMatched will only perform a replacement if there is a match, else, there will be no replacement (insertion is still a replacement of an empty string with some string).
Your expected matches are always at the end of the string, and you may leverage this in your current code. You need a regex that optionally matches COUNT= and then some text up to the first ; including the char and then checks if the current position is the end of string.
Then, just follow the logic: if Group 1 is matched, set the new count value, else, add the COUNT=1; string:
The regex is
(COUNT=[^;]*;)?$
See the regex demo.
Details
(COUNT=[^;]*;)? - an optional group 1: COUNT=, any 0 or more chars other than ; and then a ;
$ - end of string.
Dart code:
_myString.replaceFirstMapped(RegExp(r'(COUNT=[^;]*;)?$'), (match) {
return match.group(0).isEmpty ? "COUNT=1;" : "COUNT=$_newCount;" ; }
)
Note the use of replaceFirstMatched, you need to replace only the first match.

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.
It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
It could have come in rare cases: 23,242,526
It ends with an emty space or a text variable.
Here are several examples:
From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059
From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152
From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645
Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.
=RegexExtract(A11, ""(2\d{7})\b"", ", ")
Thank you in advance.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).SubMatches.Count - 1
result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
You may create a custom boundary using a non-capturing group before the pattern you have:
(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^
The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).
As an alternative to also match 8 digits in values like 23,242,526 and start with a digit 1-9 you might use
\b[1-9](?:,?\d){7}\b
\b Word boundary
[1-9] Match the firstdigit 1-9
(?:,?\d){7} Repeat 7 times matching an optional comma and a digit
\b Word boundary
Regex demo
Then you could afterwards replace the comma's with an empty string.

How to find any non-digit characters using RegEx in ABAP

I need a Regular Expression to check whether a value contains any other characters than digits between 0 and 9.
I also want to check the length of the value.
The RegEx I´ve made: ^([0-9]\d{6})$
My test value is: 123Z45 and 123456
The ABAP code:
FIND ALL OCCURENCES OF REGEX '^([0-9]\d{6})$' IN L_VALUE RESULTS DATA(LT_RESULTS).
I´m expecting a result in LT_RESULTS, when I´m testing the first test value '123Z45', because there is a non-digit character.
But LT_RESULTS is in nearly every test case empty.
Your expression ^([0-9]\d{6})$ translates to:
^ - start of input
( - begin capture group
[0-9] - a character between 0 and 9
\d{6} - six digits (digit = character between 0 and 9)
) - end capture group
$ - end of input
So it will only match 1234567 (7 digit strings), not 123456, or 123Z45.
If you just need to find a string that contains non digits you could use the following instead: ^\d*[^\d]+\d*$
* - previous element may occur zero, one or more times
[^\d] - ^ right after [ means "NOT", i.e. any character which is not a digit
+ - previous element may occur one or more times
Example:
const expression = /^\d*[^\d]+\d*$/;
const inputs = ['123Z45', '123456', 'abc', 'a21345', '1234f', '142345'];
console.log(inputs.filter(i => expression.test(i)));
You can also use this character class if you want to extract non-digit group:
DATA(l_guid) = '0074162D8EAA549794A4EF38D9553990680B89A1'.
DATA(regx) = '[[:alpha:]]+'.
DATA(substr) = match( val = l_guid
regex = regx
occ = 1 ).
It finds a first occured non-digit group of characters and shows it.
If you want to just check if they are exists or how much of them reside in your string, count built-in function is your friend:
DATA(how_many) = count( val = l_guid regex = regx ).
DATA(yes) = boolc( count( val = l_guid regex = regx ) > 0 ).
Match and count exist since ABAP 7.50.
If you don't need a Regular Expression for something more complex, ABAP has some nice comparison operators CO (Contains Only), CA, NA etc for you. Something like:
IF L_VALUE CO '0123456789' AND STRLEN( L_VALUE ) = 6.

Extracting Parenthetical Data Using Regex

I have a small sub that extracts parenthetical data (including parentheses) from a string and stores it in cells adjacent to the string:
Sub parens()
Dim s As String, i As Long
Dim c As Collection
Set c = New Collection
s = ActiveCell.Value
ary = Split(s, ")")
For i = LBound(ary) To UBound(ary) - 1
bry = Split(ary(i), "(")
c.Add "(" & bry(1) & ")"
Next i
For i = 1 To c.Count
ActiveCell.Offset(0, i).NumberFormat = "#"
ActiveCell.Offset(0, i).Value = c.Item(i)
Next i
End Sub
For example:
I am now trying to replace this with some Regex code. I am NOT a regex expert. I want to create a pattern that looks for an open parenthesis followed by zero or more characters of any type followed by a close parenthesis.
I came up with:
\((.+?)\)
My current new code is:
Sub qwerty2()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = ActiveCell.Value
MsgBox inpt
Set regex = New RegExp
regex.Pattern = "\((.+?)\)"
Set MColl = regex.Execute(inpt)
MsgBox MColl.Count
temp2 = MColl(0).Value
MsgBox temp2
End Sub
The code has at least two problems:
It will only get the first match in the string.(Mcoll.Count is always 1)
It will not recognize zero characters between the parentheses. (I think the .+? requires at least one character)
Does anyone have any suggestions ??
By default, RegExp Global property is False. You need to set it to True.
As for the regex, to match zero or more chars as few as possible, you need *?, not +?. Note that both are lazy (match as few as necessary to find a valid match), but + requires at least one char, while * allows matching zero chars (an empty string).
Thus, use
Set regex = New RegExp
regex.Global = True
regex.Pattern = "\((.*?)\)"
As for the regex, you can also use
regex.Pattern = "\(([^()]*)\)"
where [^()] is a negated character class matching any char but ( and ), zero or more times (due to * quantifier), matching as many such chars as possible (* is a greedy quantifier).

Split a string according to a regexp in VBScript

I would like to split a string into an array according to a regular expression similar to what can be done with preg_split in PHP or VBScript Split function but with a regex in place of delimiter.
Using VBScript Regexp object, I can execute a regex but it returns the matches (so I get a collection of my splitters... that's not what I want)
Is there a way to do so ?
Thank you
If you can reserve a special delimiter string, i.e. a string that you can choose that will never be a part of the real input string (perhaps something like "###"), then you can use regex replacement to replace all matches of your pattern to "###", and then split on "###".
Another possibility is to use a capturing group. If your delimiter regex is, say, \d+, then you search for (.*?)\d+, and then extract what the group captured in each match (see before and after on rubular.com).
You can alway use the returned array of matches as input to the split function. You split the original string using the first match - the first part of the string is the first split, then split the remainder of the string (minus the first part and the first match)... continue until done.
I wrote this for my use. Might be what you're looking for.
Function RegSplit(szPattern, szStr)
Dim oAl, oRe, oMatches
Set oRe = New RegExp
oRe.Pattern = "^(.*)(" & szPattern & ")(.*)$"
oRe.IgnoreCase = True
oRe.Global = True
Set oAl = CreateObject("System.Collections.ArrayList")
Do
Set oMatches = oRe.Execute(szStr)
If oMatches.Count > 0 Then
oAl.Add oMatches(0).SubMatches(2)
szStr = oMatches(0).SubMatches(0)
Else
oAl.Add szStr
Exit Do
End If
Loop
oAl.Reverse
RegSplit = oAl.ToArray
End Function
'**************************************************************
Dim A
A = RegSplit("[,|;|#]", "bob,;joe;tony#bill")
WScript.Echo Join(A, vbCrLf)
Returns:
bob
joe
tony
bill
I think you can achieve this by using Execute to match on the required splitter string, but capturing all the preceding characters (after the previous match) as a group. Here is some code that could do what you want.
'// Function splits a string on matches
'// against a given string
Function SplitText(strInput,sFind)
Dim ArrOut()
'// Don't do anything if no string to be found
If len(sFind) = 0 then
redim ArrOut(0)
ArrOut(0) = strInput
SplitText = ArrOut
Exit Function
end If
'// Define regexp
Dim re
Set re = New RegExp
'// Pattern to be found - i.e. the given
'// match or the end of the string, preceded
'// by any number of characters
re.Pattern="(.*?)(?:" & sFind & "|$)"
re.IgnoreCase = True
re.Global = True
'// find all the matches >> match collection
Dim oMatches: Set oMatches = re.Execute( strInput )
'// Prepare to process
Dim oMatch
Dim ix
Dim iMax
'// Initialize the output array
iMax = oMatches.Count - 1
redim arrOut( iMax)
'// Process each match
For ix = 0 to iMax
'// get the match
Set oMatch = oMatches(ix)
'// Get the captured string that precedes the match
arrOut( ix ) = oMatch.SubMatches(0)
Next
Set re = nothing
'// Check if the last entry was empty - this
'// removes one entry if the string ended on a match
if arrOut(iMax) = "" then Redim Preserve ArrOut(iMax-1)
'// Return the processed output
SplitText = arrOut
End Function