What is the regex for the escaping []? - regex

I am trying to match square brackets i.e. [] in regex VBA in excel. I am trying with the below code but it is not working.
Public Function IsSpecial(s As String) As Long
Dim L As Long, LL As Long
Dim sCh As String
IsSpecial = 0
For L = 1 To Len(s)
sCh = Mid(s, L, 1)
If sCh Like "[0-9a-zA-Z/;#%,'‚.+&/\(): ]" Or sCh = "_" Or sCh Like "[-]" Or sCh Like "\[" Then
Else
IsSpecial = 1
Exit Function
End If
Next L
End Function

According to Using the Like operator and wildcard characters in string comparisons:
You can use a group of one or more characters (charlist) enclosed in brackets ([ ]) to match any single character in expression, and charlist can include almost any characters in the ANSI character set, including digits. You can use the special characters opening bracket ([), question mark (?), number sign (#), and asterisk (*) to match themselves directly only if enclosed in brackets. You cannot use the closing bracket (]) within a group to match itself, but you can use it outside a group as an individual character.
So, you need to use
Ch Like "[[]"
However, the function you have is not following your logic, since it checks each char individually, and you want to make sure [] is checked as a char sequence.
With a regex, it will look like
Public Function IsSpecial(s As String) As Long
Dim L As Long, LL As Long
Dim rx As New regExp
rx.Pattern = "^(?:[0-9a-zA-Z/;#%,'‚.+&/\\(): _-]|\[])*$"
IsSpecial = 0
If Not rx.Test(s) Then IsSpecial = 1
End Function

Related

Regex and LINQ extract group by group name

I have a SQL sintax in the form of a string in which there are some parameters written in a standard way "<parameter1>, <parameter2>".
Then i have another string with the parameters values written in a standard way as well: "Parameter1=123; Parameter2=aaa".
I need to match the parameters in the first SQL with the values in the second one.
What I have so far:`
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim vmp As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "([\w ]+)=([\w ]+)")
Dim vmc As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches(BodySQL, "(?<=\<).+?(?=\>)")
For Each vm As RegularExpressions.Match In vmc
Dim Vl As String = (From m As RegularExpressions.Match In vmp
Where m.Groups(1).Value.Trim = vm.Value.ToString).Select(Of String)(Function(f) f.Groups(2).Value).ElementAt(0).Trim
BodySQL = BodySQL.Replace(vm.Value, Vl)
Next
It works for the first parameter, but then i get
"System.ArgumentOutOfRangeException: 'Specified argument was out of the range of valid values.
Parameter name: index'"
Can I please ask why?
You can extract the keys and values with one regex from the param=value strings, create a dictionary out of them, and then use Regex.Replace to replace the matches with the dictionary values:
Imports System.Text.RegularExpressions
' ...
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)(StringComparer.InvariantCultureIgnoreCase)
' (StringComparer.InvariantCultureIgnoreCase) makes the dictionary keys case insensitive
For Each match As Match In Regex.Matches("PARAMETER1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*([^;]*[^;\s])")
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, "<([^<>]*)>",
Function(match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
Output:
Blablabal WHERE X=2555 AND Y=12/02/2021
The (\S+)\s*=\s*([^;]*[^;\s]) pattern matches
(\S+) - captures into Group 1 any one or more non-whitespace chars (the key value)
\s*=\s* - a = char enclosed with zero or more whitespace chars
([^;]*[^;\s]) - Group 2: any zero or more chars other than ; and then one char other than ; and whitespace (the value, it cannot be empty with this pattern. If you want it to be possible to match empty values, you will need to remove [^;\s] and then use Trim() on the match.Groups(2).Value in the code.)
The <([^<>]*)> regex matches
< - a < char (do not escape this char in any regex flavor please, it is never a special regex metachar)
([^<>]*) - Group 1: any zero or more chars other than < and >
> - a literal > char.
Since the key is in Group 1 and < and > on both ends are consumed, the < and > are removed when replacing with the found value.
zero or more chars other than > and < between < and >.
The error is self explanatory. You are trying to access an array or List and specifying an index value that is either negative or larger than the largest index available.
.ElementAt(0) / m.Groups(1) / f.Groups(2)
My guess is that one of them might go out of bounds. Try to debug it with a breakpoint and check the values of your variables.
This is what i did with your code:
Dim vmc = "\<(.*?)\>"
i changed this regex so that it could also give me the "<>"
Dim BodySQL = "Blablabal WHERE X=<Parameter1> AND Y=<Parameter2>"
Dim args As New Dictionary(Of String, String)
For Each match As Match In Regex.Matches("Parameter1=2555; Parameter2 = 12/02/2021", "(\S+)\s*=\s*(\S[^;]+)")
i changed the regex expression to exclude the ";"
args.Add(match.Groups(1).Value, match.Groups(2).Value)
Next
Console.WriteLine(Regex.Replace(BodySQL, vmc,
Function(match As Match)
Return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), match.Value)
End Function))
And now i have what i needed. Thank you a lot :)
Output:
WHERE X = 2555,Y = 12/02/2021

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.
It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
It could have come in rare cases: 23,242,526
It ends with an emty space or a text variable.
Here are several examples:
From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059
From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152
From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645
Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.
=RegexExtract(A11, ""(2\d{7})\b"", ", ")
Thank you in advance.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).SubMatches.Count - 1
result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
You may create a custom boundary using a non-capturing group before the pattern you have:
(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^
The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).
As an alternative to also match 8 digits in values like 23,242,526 and start with a digit 1-9 you might use
\b[1-9](?:,?\d){7}\b
\b Word boundary
[1-9] Match the firstdigit 1-9
(?:,?\d){7} Repeat 7 times matching an optional comma and a digit
\b Word boundary
Regex demo
Then you could afterwards replace the comma's with an empty string.

How to find any non-digit characters using RegEx in ABAP

I need a Regular Expression to check whether a value contains any other characters than digits between 0 and 9.
I also want to check the length of the value.
The RegEx I´ve made: ^([0-9]\d{6})$
My test value is: 123Z45 and 123456
The ABAP code:
FIND ALL OCCURENCES OF REGEX '^([0-9]\d{6})$' IN L_VALUE RESULTS DATA(LT_RESULTS).
I´m expecting a result in LT_RESULTS, when I´m testing the first test value '123Z45', because there is a non-digit character.
But LT_RESULTS is in nearly every test case empty.
Your expression ^([0-9]\d{6})$ translates to:
^ - start of input
( - begin capture group
[0-9] - a character between 0 and 9
\d{6} - six digits (digit = character between 0 and 9)
) - end capture group
$ - end of input
So it will only match 1234567 (7 digit strings), not 123456, or 123Z45.
If you just need to find a string that contains non digits you could use the following instead: ^\d*[^\d]+\d*$
* - previous element may occur zero, one or more times
[^\d] - ^ right after [ means "NOT", i.e. any character which is not a digit
+ - previous element may occur one or more times
Example:
const expression = /^\d*[^\d]+\d*$/;
const inputs = ['123Z45', '123456', 'abc', 'a21345', '1234f', '142345'];
console.log(inputs.filter(i => expression.test(i)));
You can also use this character class if you want to extract non-digit group:
DATA(l_guid) = '0074162D8EAA549794A4EF38D9553990680B89A1'.
DATA(regx) = '[[:alpha:]]+'.
DATA(substr) = match( val = l_guid
regex = regx
occ = 1 ).
It finds a first occured non-digit group of characters and shows it.
If you want to just check if they are exists or how much of them reside in your string, count built-in function is your friend:
DATA(how_many) = count( val = l_guid regex = regx ).
DATA(yes) = boolc( count( val = l_guid regex = regx ) > 0 ).
Match and count exist since ABAP 7.50.
If you don't need a Regular Expression for something more complex, ABAP has some nice comparison operators CO (Contains Only), CA, NA etc for you. Something like:
IF L_VALUE CO '0123456789' AND STRLEN( L_VALUE ) = 6.

Extracting Parenthetical Data Using Regex

I have a small sub that extracts parenthetical data (including parentheses) from a string and stores it in cells adjacent to the string:
Sub parens()
Dim s As String, i As Long
Dim c As Collection
Set c = New Collection
s = ActiveCell.Value
ary = Split(s, ")")
For i = LBound(ary) To UBound(ary) - 1
bry = Split(ary(i), "(")
c.Add "(" & bry(1) & ")"
Next i
For i = 1 To c.Count
ActiveCell.Offset(0, i).NumberFormat = "#"
ActiveCell.Offset(0, i).Value = c.Item(i)
Next i
End Sub
For example:
I am now trying to replace this with some Regex code. I am NOT a regex expert. I want to create a pattern that looks for an open parenthesis followed by zero or more characters of any type followed by a close parenthesis.
I came up with:
\((.+?)\)
My current new code is:
Sub qwerty2()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = ActiveCell.Value
MsgBox inpt
Set regex = New RegExp
regex.Pattern = "\((.+?)\)"
Set MColl = regex.Execute(inpt)
MsgBox MColl.Count
temp2 = MColl(0).Value
MsgBox temp2
End Sub
The code has at least two problems:
It will only get the first match in the string.(Mcoll.Count is always 1)
It will not recognize zero characters between the parentheses. (I think the .+? requires at least one character)
Does anyone have any suggestions ??
By default, RegExp Global property is False. You need to set it to True.
As for the regex, to match zero or more chars as few as possible, you need *?, not +?. Note that both are lazy (match as few as necessary to find a valid match), but + requires at least one char, while * allows matching zero chars (an empty string).
Thus, use
Set regex = New RegExp
regex.Global = True
regex.Pattern = "\((.*?)\)"
As for the regex, you can also use
regex.Pattern = "\(([^()]*)\)"
where [^()] is a negated character class matching any char but ( and ), zero or more times (due to * quantifier), matching as many such chars as possible (* is a greedy quantifier).

VBA regexp to check special symbols

I have tried to use what I've learnt in this post,
and now I want to compose a RegExp which checks whether a string contains digits and commas. For example, "1,2,55,2" should be ok, whereas "a,2,55,2" or "1.2,55,2" should fail test. My code:
Private Function testRegExp(str, pattern) As Boolean
Dim regEx As New RegExp
If pattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If regEx.Test(str) Then
testRegExp = True
Else
testRegExp = False
End If
Else
testRegExp = True
End If
End Function
Public Sub foo()
MsgBox testRegExp("2.d", "[0-9]+")
End Sub
MsgBox yields true instead of false. What's the problem ?
Your regex matches a partial string, it matches a digit in all 55,2, a,2,55,2, 1.2,55,2 input strings.
Use anchors ^ and $ to enforce a full string match and add a comma to the character class as you say you want to match strings that only contain digits and commas:
MsgBox testRegExp("2.d", "^[0-9,]*$")
^ ^ ^
I also suggest using * quantifier to match 0 or more occurrences, rather than + (1 or more occurrences), but it is something you need to decide for yourself (whether you want to allow an empty string match or not).
Here is the regex demo. Note it is for PCRE regex flavor, but this regex will perform similarly in VBA.
Yes, as #Chaz suggests, if you do not need to match the string/line itself, the alternative is to match an inverse character class:
MsgBox testRegExp("2.d", "[^0-9,]")
This way, the negated character class [^0-9,] will match any character but a comma / digit, invalidating the string. If the result is True, it will mean the string contains some characters other than digits and a comma.
You can use the limited built in pattern matching for that:
function isOk(str) As boolean
for i = 1 To len(str)
if Mid$(str, i, 1) Like "[!0-9,]" then exit function
next
g = True and Len(str) > 0
end function