Regex Visual Basic UDF not executing as expected - regex

I am busy with a regular expression for VB and I cant seem to find where I am going wrong here.
Example:
Pattern:(?<=\d{10,11})(.|[\r\n])*(?=Mobile)
Input: 6578543567 Text I want to retain Mobile Operation
Output: #Name?
List item
The number consists of 10 and 11 digit telephone numbers.
The text I want to retain varies in length.
The text always precedes the word Mobile.
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function

IIRC, VBA doesn't support lookarounds in it's Regular Expression implementation.
But, this appears to be a relatively easy string to match. You have a group of consecutive numbers, followed by a space, and then you want to match an undisclosed amount of words up to the word "Mobile".
You could use the following pattern to accomplish this:
\d+\s(.*?)\sMobile
Details (See it in action here):
\d any digit
+ (Quantifier) One to unlimited times - greedy
\s a single whitespace character
(...) capturing group to grab the text you want to return
. any character
*? (Quantifier) Zero to unlimited times - lazy
\s a single whitespace character
Mobile literally matches the word Mobile
What's with the greedy vs lazy quantifiers?
The first quantifier + is Greedy. What makes this greedy? The lack of the ? immediately following this quantifier makes it greedy. What this essentially does is it will consume as much ass it possibly can of the \d.
Since we added a \s to the end of that statement, this won't really change the outcome because it will have to match all the digits anyway to get to that space \s. However, if you decided you wanted to capture (...) the space and you removed the \s, then this would be important - because your .*? will consume all but one of your numbers \d if this was lazy.
So, then why are we using a lazy quantifier with .*?? Well, if your input string contained two words that said Mobile, a greedy quantifier would consumer the first word and match up up to the second. If you only want to match up to the first word of Mobile, then you want to make it lazy.
So Finally - Now how do I retrieve the text in my capturing group (...)?
With VBA, you would use the Matches object. First I would recommend testing to ensure that there is a match - this can be done in a simple If...Then statement. Once this test passes, you can then safely obtain your return value.
With New RegExp
.Pattern = "\d+\s(.*?)\sMobile"
.IngoreCase = True 'If your 'Mobile' word can be any case, switch to false
If .Test(inputString) Then
retVal = .Execute(inputString)(0).SubMatches(0)
End If
End With
inputString would be the string that contains the test values.
retVal would be what is returned from your capturing group.

Related

Regex pattern to replace date and time node in xml of word document

I need to replace the date and time in xml file using regex pattern.
xml text would contain:
w:date="2022-12-01T01:17:00Z"
w:date="2022-12-01T02:17:00Z"
w:date="2022-12-02T03:17:00Z"
possible regex pattern for the above would be:
w:date="[\d\W]\w[\d\W]\w"
but it is not replacing anything and the resulted string remain intact in the following VBA code:
Sub ChangeDateTime()
Dim sWOOXML As String
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Pattern = "w:date=" & Chr(34) & "[\d\W]\w[\d\W]\w" & Chr(34)
sWOOXML = ActiveDocument.Content.WordOpenXML
sWOOXML = objRegEx.Replace(sWOOXML, "")
ActiveDocument.Content.InsertXML sWOOXML
Beep
End Sub
Your [\d\W]\w[\d\W]\w regex prevents from matching since it only finds two repetitions of a digit or non-word char + a word char sequence between two double quotes, while you have many more chars there.
You can use
objRegEx.Pattern = "w:date=""\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z"""
See the regex demo. Note you may add a double quote to the string using a doubled ", no need to use Chr(34).
This is a verbose pattern where \d{1,2} matches one or two digits and \d{4} matches four digits, the rest is self-explanatory.

VBA check String is matched exactly word

I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?
It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.
To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.

Get the non-matching part of the pattern through a RegEx

In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.
Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.
As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.

How to simulate lookbehind in VBA regex?

I'm trying to build a regex pattern that will return False if a string starts with certain characters or contains non-word characters, but because VBA's RegExp object doesn't support lookbehind, I am finding this difficult. The only word character prefixes that should fail are B_, B-, b_, b-.
This is my test code:
Sub testregex()
Dim re As New RegExp
re.pattern = "^[^Bb][^_-]\w+$"
Debug.Print re.Test("a24")
Debug.Print re.Test("a")
Debug.Print re.Test("B_")
Debug.Print re.Test(" a1")
End Sub
I want this to return:
True
True
False
False
but instead it returns
True
False
False
True
The problem is that the pattern looks for a character that's not in [Bb], followed by a character that's not in [-_], followed by a sequence of word characters, but what I want is just one or more word characters, such that if there are 2 or more characters then the first two are not [Bb][-_].
Try matching this expression:
^([Bb][\-_]\w*)|(\w*[^\w]+\w*)$
...which will match "B_", "b_", "B-" and "b-" or anything that's not a word character. Consider a successful match a "failure" and only allow non-matches to be valid.
re.Pattern = "^(?:[^ Bb][^ _-]\w+|[^ Bb][^ _-]|[^ Bb])$"
You can get your matches with
regEx.Pattern = "^[^ bB][^_ -]*\w*$"
regEx.MultiLine = True
Debug.Print regEx.Test("a24")
Debug.Print regEx.Test("a")
Debug.Print regEx.Test("B_")
Debug.Print regEx.Test(" a1")
Output:
True
True
False
False

leading space with regex match

A newbie to regex, I'm trying to skip the first set of brackets [word1], and match any remaining text bracketed with the open bracket and closing brace [...}
Text: [word1] This is a [word2]bk{not2} sentence [word3]bk{not3}
Pattern: [^\]]\[.*?\}
So what I want is to match [word2]bk{not2} and [word3]bk{not3}, and it works, kind of, but I'm ending up with a leading space on each of the matches. Been playing with this for a couple of days (and doing a lot of reading), but I'm obviously still missing something.
\[[^} ]*}
Try this.See demo .
https://regex101.com/r/qJ8qW5/1
[^]] in your pattern match leading space. That matches any character without ].
For example, when text is [word1] This is a X[word2]bk{not2},
pattern [^\]]\[.*?\} matches X[word2]bk{not2}.
if any open brackets doesn't appear between [wordN} and {notN}, you can use:
\[[^\[}]*}
Or, you can also use Submatches with capturing groups.
Sub test()
Dim objRE As Object
Dim objMatch As Variant
Dim objMatches As Object
Dim strTest As String
strTest = "[word1] This is a [word2]bk{not2} sentence [word3]bk{not3}"
Set objRE = CreateObject("VBScript.RegExp")
With objRE
.Pattern = "[^\]](\[.*?\})"
.Global = True
End With
Set objMatches = objRE.Execute(strTest)
For Each objMatch In objMatches
Debug.Print objMatch.Submatches(0)
Next
Set objMatch = Nothing
Set objMatches = Nothing
Set objRE = Nothing
End Sub
In this sample code, pattern has Parentheses for grouping.