Optional replacement in a Regular Expression - regex

I am creating a regular expression, in VBA that uses the JS flavor of RegEx. Here is the issue I have ran into:
Current RegEx:
(^6)(?:a|ab)?
I have a 6 followed by either nothing, an 'a' or 'ab'.
In the case of a 6 followed by nothing I want to return just the 6 using $1
In the case of a 6 followed by an 'a' or 'ab' I want to return 6B
So I need that 'B' to be optional, contingent on there being an 'a' or 'ab'.
Something to the effect of : $1B?
That of course does not work. I only want the B if the 'a' or 'ab' is present, otherwise just the $1.
Is this possible to do in a single regex pattern? I could just have 2 separate patterns, one looking for only a 6 and the other for 6'a'or'ab'... but my actual regex patterns are much more complicated and I might need several patterns to cover some of them...
Thanks for looking.

I don't think your question is clearly defined--for example, I don't know why you need a replace--but from what I can infer, something like the following may work for you:
target = "6ab"
result = ""
With New RegExp
.Pattern = "^(6)(?:a(b?))?"
Set matches = .Execute(target)
If Not matches Is Nothing Then
Set mat = matches(0)
result = mat.SubMatches(0)
If mat.SubMatches.Count > 1 Then
result = result & UCase(mat.SubMatches(1))
End If
End If
Debug.Print result
End With
You basically inspect the capture groups to determine whether or not there was a hit on the b capture. Whereas you used a|ab, I think and optional b (b?) is more to the point. It's probably stylistic more than anything.

As I mentioned in my comment, there is no way to tell a regex engine to choose between literal alternatives in the replacement string. Thus, all you can do is to access Submatches to check for values that you get there, and return appropriate values.
Note that a regex you have should have 2 capturing groups, or at least a capturing group where you do not know the exact text (the (ab?)).
Here is my idea in code:
Function RxCondReplace(ByVal str As String) As String
RxCondReplace = ""
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "^6(ab?)?"
Set objMatches = objRegExp.Execute(str)
Set objMatch = objMatches.Item(0) ' Only 1 match as .Global=False
If objMatch.SubMatches.Item(0) = "a" Or _ ' check if 1st group equals "a"
objMatch.SubMatches.Item(0) = "ab" Then ' check if 1st group equals "ab"
RxCondReplace = "6B"
ElseIf objMatch.SubMatches.Item(1) = "" Then ' check if 2nd group is empty
RxCondReplace = "6"
End If
End Function
' Calling the function above
Sub CallConditionalReplace()
Debug.Print RxCondReplace("6") ' => 6
Debug.Print RxCondReplace("6a") ' => 6B
Debug.Print RxCondReplace("6ab") ' => 6B
End Sub

Related

RegEx repeating subcapture returns only last match

I have a regexp that tries to find multiple sub captures within matches:
https://regex101.com/r/nk1Q5J/2/
=\s?.*(?:((?i)Fields\("(\w+)"\)\.Value\)*?))
I've had simpler versions with equivalent results but this is the last iteration.
the Trick here is that the first group looks for a sequence that begins with '=' (to identify database field reads in VB.Net)
The single sub capture cases work:
Match 1. [comparison + single parameter read call]
= False And IsNull(Fields("lmpRoundedEndTime").Value)
=> G2: lmpRoundedEndTime
Match 3. [read into string] oRs.Open "select lmpEmployeeID,lmpShiftID,lmpTimecardDate,lmpProjectID from Timecards where lmpTimecardID = " + App.Convert.NumberToSql(Fields("lmlTimecardID").Value),Connection,adOpenStatic,adLockReadOnly,adCmdText
=> G2: lmlTimecardID
Match 4. [assignment] Fields("lmlEmployeeID").Value = oRs.Fields("lmpEmployeeID").Value
Where I am failing is a match with multiple sub-captures. My regexp returns the last (intended) sub capture :
Match 2. [read multiple input parameters] Fields("lmpPayrollHours").Value = App.Ax("TimecardFunctions").CalculateHours(Fields("lmpShiftID").Value,Fields("lmpRoundedStartTime").Value,Fields("lmpRoundedEndTime").Value)
=> G2: lmpRoundedEndTime
'''''''''''' ^ must capture: lmpShiftID , lmpRoundedStartTime , lmpRoundedEndTime
I've read up on lazy quantifiers etc, but can't wrap my head around where this goes wrong.
References:
https://www.regular-expressions.info/refrepeat.html
http://www.rexegg.com/regex-quantifiers.html
Related:
Regular expression: repeating groups only getting last group
What's the difference between "groups" and "captures" in .NET regular expressions?
BTW, I could quantify the sub capture as {1,5} safely for efficiency, but that's not the focus.
EDIT:
By using negative lookahead to exclude the left side of comparisons, this got me much closer (match 2 above now works):
(?:Fields\("(\w+)"\)\.Value)(?!\)?\s{0,2}[=|<])
but in the following block of code, only the first two are captured:
If oRs.EOF = False Then
If CInt(Fields("lmlTimecardType").Value) = 1 Then
If Trim(oRs.Fields("lmeDirectExpenseID").Value) <> "" Then
Fields("lmlExpenseID").Value = oRs.Fields("lmeDirectExpenseID").Value
End If
Else
If Trim(oRs.Fields("lmeIndirectExpenseID").Value) <> "" Then
Fields("lmlExpenseID").Value = oRs.Fields("lmeIndirectExpenseID").Value
End If
End If
If CInt(Fields("lmlTimecardType").Value) = 2 Then
If Trim(oRs.FIelds("lmeDefaultWorkCenterID").Value) <> "" Then
Fields("lmlWorkCenterID").Value = oRs.FIelds("lmeDefaultWorkCenterID").Value
End If
End If
End If
Capture1:
Fields("lmlExpenseID").Value = oRs.Fields("lmeDirectExpenseID").Value
Capture2:
Fields("lmlExpenseID").Value = oRs.Fields("lmeIndirectExpenseID").Value
Capture3 (failed):
Fields("lmlWorkCenterID").Value = oRs.FIelds("lmeDefaultWorkCenterID").Value
Actually, (?:Fields\("(\w+)"\)\.Value)(?!\)?\s{0,2}[=|<]) does work in my Excel sheet, just not in the regex101 test. Probably a slightly different standard used there.
https://regex101.com/r/p6zFqy/1/

Regular expression to match page number groups

I need a regular expression to match page numbers as found in common programs.
These usually take the form 1-5,3,5,1-9 for example.
I have a regular expression (\d+-\d+)?,(\d+-\d+?)* which I need help to refine.
As can be seen here regex101 I am matching commas and missing numbers entirely.
What I need is to match 1-5 as group 1, 3 as group 2, 5 as group 3 and 1-9 as group 4 without matching any commas.
Any help is appreciated. I will be using this in VBA.
This worked for me - am I missing something?
Sub Pages()
Dim re As Object, allMatches, m, rv, sep, c As Range, i As Long
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(\d+(-\d+)?)"
re.ignorecase = True
re.MultiLine = True
re.Global = True
For Each c In Range("B5:B20").Cells 'for example
c.Offset(0, 1).Resize(1, 10).ClearContents 'clear output cells
i = 0
If re.test(c.Value) Then
Set allMatches = re.Execute(c.Value)
For Each m In allMatches
i = i + 1
c.Offset(0, i).Value = m
Next m
End If
Next c
End Sub
If I recall correctly, capturing a dynamic number of groups will not work. You can pre-specify the format / number of groups to be matched, or you can catch the repeated groups as one and split them afterwards.
If you know the format, just do
(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)
which of course is not very neat.
If you want the flexible structure, match the first group and all the rest as a second and then split the latter by the delimiter ',' in whichever language.
(\d+(?:-\d+)?)((?:(?:,)(\d+(?:-\d+)?))*)
You need to make the -\d+ part optional, since you don't always have ranges. And the comma between each range should be part of the second group with the * quantifier, so you can match a single range with no comma after it.
\d+(-\d+)?(,\d+(-\d+)?)*
This will match the string that contains all the ranges. To get an array of individual ranges without the commas, do a second match in this string:
\d+(-\d+)?
Use the VBA function for getting an array of all matches of a regexp (sorry, I don't know VBA, so can't provide the specific syntax).

Need help to understand the difference between similar RegExp usage where 1 works and 1 doesn't

I'm a big fan of stackoverflow, though I'm new to using regular expressions. I have a QND search utility that I wrote to help me find/report things I'm searching for in source code. I'm having a problem with figuring out what's wrong with my pattern searching that's not returning a match string that includes all text between two double quotes. In one search it works (looking for Session variables), but in a similar one (looking for redirects) it doesn't.
Here's a sample aspx.vb file that I'm testing against:
Partial Class _1
Inherits System.Web.UI.Page
Private strSecurityTest As String = ""
Private strUserId As String = ""
Private strPassword As String = ""
Private strMyName As String = ""
Private Sub sample()
strSecurityTest = Session("UserID")
If strSecurityTest = "NeedsLogin" Or
strSecurityTest = "" Or
Session("SecureCount") = 0 Or
Session("CommandName") <> strMyName Then
Server.Transfer("WebApLogin.aspx")
End If
End Sub
End Class
Sucessful match:
When I look for all occurances of Session("*") with pattern ==> Session\(\"\w*\"\)
I get correct results. Noting the above source code, I get 3 matches returned:
Session("UserID")
Session("SecureCount")
Session("CommandName")
Failed matching:
However when I try another search by replacing "Session" with "Transfer" ==> Transfer\(\"\w*\"\)
nothing is returned.
I have also tried these matching patterns:
Server.Transfer("*") ==> Server\.Transfer\(\"\w*\"\)
*Server.Transfer("*") ==> \w*Server\.Transfer\(\"\w*\"\)
Each of these doesn't return any matches.
In my live code I tried removing vbCr, vbLf, vbCrLf before the regex match, but still no matches
were found.
Symptom:
A symptom that I see is when I remove the text from the right side of the pattern, up to and
including the \w* ... then the search finds matches ==> Transfer\(\" However since the search
is now open-ended ... I can't capture the value between the double quotes that I want.
Sample VB code is:
Private Sub TestRegExPattern(wData As String, wPattern As String, bMatchCase As Boolean)
'
' Invoke the Match method.
'
Dim m As Match = Nothing
If Not bMatchCase Then
m = Regex.Match(wData, wPattern, RegexOptions.IgnoreCase)
Else
m = Regex.Match(wData, wPattern)
End If
'
' If first match found process and look for more
'
If (m.Success) Then
'
' Process match
'
' Get next match.
While m.Success
m = m.NextMatch()
If m.Success Then
'
' Process additional matches
'
End If
End While
End If
m = nothing
End Sub
I'm looking for some pointers to understand why my simple search only works with one particular pattern, and not another that only changes the leading text to be matched explicitly.

VB.NET Regex Replacement

I have a list of fields named flavx(other text) that go 1 through 10.
For example, I might have:
flav2PGPct
I need to turn it to
flav12PGPct
I need to replaced 1 through 10 with 11 through 20 using VB.NET's Replace function with Regex, but I can't get it working right.
Can anyone help?
Here's what I've tried:
(\.)flav*[1-9]
I have no idea what to place in the replacement box...
Use this regex for search: (flav)(\d\w*) and this one for replace: ${1}1$2.
I'd use 2 regex runs to obtain the desired result because it is not possible to use a replacement literal with alternatives.
The first regex would replace 10 to 20 and the second will handle 1 to 9 digits:
Dim rx1to9 As Regex = New Regex("(?<=\D|^)[1-9](?=\D|$)") '1 - 9
Dim rx10 As Regex = New Regex("(?<=\D|^)10(?=\D|$)") '10
Dim str As String = "flav2PG10Pct101"
Dim result = rx10.Replace(str, "20")
result = rx1to9.Replace(result, "1$&")
Console.WriteLine(result)
See IDEONE demo (output is flav12PG20Pct101)
Regex explanation:
(?<=\D|^) - A positive look-behind that makes sure there is no digit (\D) or start of string (^) before...
[1-9] - a single digit from 1 to 9 (or, in the second regex, 10 matching literal 10)
(?=\D|$) - A positive look-ahead that makes sure there is no digit (\D) or the end of string ($) after the digit.
If you must check if flav is present in the string, you may use a bit different look-behind: (?<=flav\D*|^), or - if spaces should not occur between flav and the digit: (?<=flav[^\d\s]*|^).
Regexes work best with strings rather than numbers, so an easy way is to use a regex to get the parts of the string you want to adjust and then concatenate the calculated part in a string:
Option Strict On
Option Infer On
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim re As New Regex("^flav([0-9]+)(.*)$")
Dim s = "flav1PGPct"
Dim t = ""
Dim m = re.Match(s)
If m.Success Then
t = CStr(Integer.Parse(m.Groups(1).Value) + 10)
t = "flav" & t & m.Groups(2).Value
End If
Console.WriteLine(t)
Console.ReadLine()
End Sub
End Module

How to test for specific characters with regex in VBA

I need to test for a string variable to ensure it matches a specific format:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
...where x can be any alphanumerical character (a - z, 0 - 9).
I've tried the following, but it doesn't seem to work (test values constantly fail)
If val Like "^([A-Za-z0-9_]{8})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{4})([-]{1})([A-Za-z0-9_]{12})" Then
MsgBox "OK"
Else
MsgBox "FAIL"
End If
.
fnCheckSubscriptionID "fdda752d-32de-474e-959e-4b5bf7574436"
Any pointers? I don't mind if this can be achieved in vba or with a formula.
You are already using the ^ beginning-of-string anchor, which is terrific. You also need the $ end-of-string anchor, otherwise in the last group of digits, the regex engine is able to match the first 12 digits of a longer group of digits (e.g. 15 digits).
I rewrote your regex in a more compact way:
^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$
Note these few tweaks:
[-]{1} can just be expressed with -
I removed the underscores as you say you only want letters and digits. If you do want underscores, instead of [A-Z0-9]{8} (for instance), you can just write \w{8} as \w matches letters, digits and underscores.
Removed the lowercase letters. If you do want to allow lowercase letters, we'll turn on case-insensitive mode in the code (see line 3 of the sample code below).
No need for (capturing groups), so removed the parentheses
We have three groups of four letters and a dash, so wrote (?:[A-Z0-9]{4}-) with a {3}
Sample code
Dim myRegExp, FoundMatch
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^[A-Z0-9]{8}-(?:[A-Z0-9]{4}-){3}[A-Z0-9]{12}$"
FoundMatch = myRegExp.Test(SubjectString)
You can do this either with a regular expression, or with just native VBA. I am assuming from your code that the underscore character is also valid in the string.
To do this with native VBA, you need to build up the LIKE string since quantifiers are not included. Also using Option Compare Text makes the "like" action case insensitive.
Option Explicit
Option Compare Text
Function TestFormat(S As String) As Boolean
'Sections
Dim S1 As String, S2_4 As String, S5 As String
Dim sLike As String
With WorksheetFunction
S1 = .Rept("[A-Z0-9_]", 8)
S2_4 = .Rept("[A-Z0-9_]", 4)
S5 = .Rept("[A-Z0-9_]", 12)
sLike = S1 & .Rept("-" & S2_4, 3) & "-" & S5
End With
TestFormat = S Like sLike
End Function
With regular expressions, the pattern is simpler to build, but the execution time may be longer, and that may make a difference if you are processing very large amounts of data.
Function TestFormatRegex(S As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^\w{8}(?:-\w{4}){3}-\w{12}$"
TestFormatRegex = .test(S)
End With
End Function
Sub Test()
MsgBox fnCheckSubscriptionID("XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
End Sub
Function fnCheckSubscriptionID(strCont)
' Tools - References - add "Microsoft VBScript Regular Expressions 5.5"
With New RegExp
.Pattern = "^\w{8}-\w{4}-\w{4}-\w{4}-\w{12}$"
.Global = True
.MultiLine = True
fnCheckSubscriptionID = .Test(strCont)
End With
End Function
In case of any problems with early binding you can use late binding With CreateObject("VBScript.RegExp") instead of With New RegExp.