VBA check String is matched exactly word

VBA check String is matched exactly word - regex

I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?

It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.

To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,

\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.

Related

Regex pattern to replace date and time node in xml of word document

I need to replace the date and time in xml file using regex pattern.
xml text would contain:
w:date="2022-12-01T01:17:00Z"
w:date="2022-12-01T02:17:00Z"
w:date="2022-12-02T03:17:00Z"
possible regex pattern for the above would be:
w:date="[\d\W]\w[\d\W]\w"
but it is not replacing anything and the resulted string remain intact in the following VBA code:
Sub ChangeDateTime()
Dim sWOOXML As String
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Pattern = "w:date=" & Chr(34) & "[\d\W]\w[\d\W]\w" & Chr(34)
sWOOXML = ActiveDocument.Content.WordOpenXML
sWOOXML = objRegEx.Replace(sWOOXML, "")
ActiveDocument.Content.InsertXML sWOOXML
Beep
End Sub

Your [\d\W]\w[\d\W]\w regex prevents from matching since it only finds two repetitions of a digit or non-word char + a word char sequence between two double quotes, while you have many more chars there.
You can use
objRegEx.Pattern = "w:date=""\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z"""
See the regex demo. Note you may add a double quote to the string using a doubled ", no need to use Chr(34).
This is a verbose pattern where \d{1,2} matches one or two digits and \d{4} matches four digits, the rest is self-explanatory.

Get the non-matching part of the pattern through a RegEx

In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.

Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.

As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.

Regex Visual Basic UDF not executing as expected

I am busy with a regular expression for VB and I cant seem to find where I am going wrong here.
Example:
Pattern:(?<=\d{10,11})(.|[\r\n])*(?=Mobile)
Input: 6578543567 Text I want to retain Mobile Operation
Output: #Name?
List item
The number consists of 10 and 11 digit telephone numbers.
The text I want to retain varies in length.
The text always precedes the word Mobile.
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function

IIRC, VBA doesn't support lookarounds in it's Regular Expression implementation.
But, this appears to be a relatively easy string to match. You have a group of consecutive numbers, followed by a space, and then you want to match an undisclosed amount of words up to the word "Mobile".
You could use the following pattern to accomplish this:
\d+\s(.*?)\sMobile
Details (See it in action here):
\d any digit
+ (Quantifier) One to unlimited times - greedy
\s a single whitespace character
(...) capturing group to grab the text you want to return
. any character
*? (Quantifier) Zero to unlimited times - lazy
\s a single whitespace character
Mobile literally matches the word Mobile
What's with the greedy vs lazy quantifiers?
The first quantifier + is Greedy. What makes this greedy? The lack of the ? immediately following this quantifier makes it greedy. What this essentially does is it will consume as much ass it possibly can of the \d.
Since we added a \s to the end of that statement, this won't really change the outcome because it will have to match all the digits anyway to get to that space \s. However, if you decided you wanted to capture (...) the space and you removed the \s, then this would be important - because your .*? will consume all but one of your numbers \d if this was lazy.
So, then why are we using a lazy quantifier with .*?? Well, if your input string contained two words that said Mobile, a greedy quantifier would consumer the first word and match up up to the second. If you only want to match up to the first word of Mobile, then you want to make it lazy.
So Finally - Now how do I retrieve the text in my capturing group (...)?
With VBA, you would use the Matches object. First I would recommend testing to ensure that there is a match - this can be done in a simple If...Then statement. Once this test passes, you can then safely obtain your return value.
With New RegExp
.Pattern = "\d+\s(.*?)\sMobile"
.IngoreCase = True 'If your 'Mobile' word can be any case, switch to false
If .Test(inputString) Then
retVal = .Execute(inputString)(0).SubMatches(0)
End If
End With
inputString would be the string that contains the test values.
retVal would be what is returned from your capturing group.

Regex lookahead to match everything prior to 1st OR 2nd group of digits

Regex in VBA.
I am using the following regex to match the second occurance of a 4-digit group, or the first group if there is only one group:
\b\d{4}\b(?!.+\b\d{4}\b)
Now I need to do kind of the opposite: I need to match everything up until the second occurance of a 4-digit group, or up until the first group if there is only one. If there are no 4-digit groups, capture the entire string.
This would be sufficient.
But there is also a preferable "bonus" route: If there exists a way to match everything up until a 4-digit group that is optionally followed by some random text, but only if there is no other 4-digit group following it. If there exists a second group of 4 digits, capture everything up until that group (including the first group and periods, but not commas). If there are no groups, capture everything. If the line starts with a 4-digit group, capture nothing.
I understand that also this could (should?) be done with a lookahead, but I am not having any luck in figuring out how they work for this purpose.
Examples:
Input: String.String String 4444
Capture: String.String String 4444
Input: String4444 8888 String
Capture: String4444
Input: String String 444 . B, 8888
Capture: String String 444 . B
Bonus case:
Input: 8888 String
Capture:

for up until the second occurrence of a 4-digit group, or up until the first group if there is only one use this pattern
^((?:.*?\d{4})?.*?)(?=\s*\b\d{4}\b)
Demo
per comment below, use this pattern
^((?:.*?\d{4})?.*?(?=\s*\b\d{4}\b)|.*)
Demo

You can use this regex in VBA to capture lines with 4-digit numbers, or those that do not have 4-digit numbers in them:
^((?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4})|(?!.*[0-9]{4}).*)
See demo, it should work the same in VBA.
The regex consists of 2 alternatives: (?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) and (?!.*[0-9]{4}).*.
(?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) matches 0 or more (as few as possible) characters that are preceded by 0 or 1 sequence of characters followed by a 4-digit number, and are followed by optional space(s) and 4 digit number.
(?!.*[0-9]{4}).* matches any number of any characters that do not have a 4-digit number inside.
Note that to only match whole numbers (not part of other words) you need to add \b around the [0-9]{4} patterns (i.e. \b[0-9]{4}\b).

Matches everything except spaces till last occurace of a 4 digit word
You can use the following:
(?:(?! ).)+(?=.*\b\d{4}\b)
See DEMO

For your basic case (marked by you as sufficient), this will work:
((?:(?!\d{4}).)*(?:\d{4})?(?:(?!\d{4}).)*)(?=\d{4})
You can pad every \d{4} internally with \b if you need to.
See a demo here.

If anyone is interested, I cheated to fully solve my problem.
Building on this answer, which solves the vast majority of my data set, I used program logic to catch some rarely seen use-cases. It seemed difficult to get a single regex to cover all the situations, so this seems like a viable alternative.
Problem is illustrated here.
The code isn't bulletproof yet, but this is the gist:
Function cRegEx (str As String) As String
Dim rExp As Object, rMatch As Object, regP As String, strL() As String
regP = "^((?:.*?[0-9]{4})?.*?(?:(?=\s*[0-9]{4})|(?:(?!\d{4}).)*)|(?!.*[0-9]{4}).*)"
' Encountered two use-cases that weren't easily solvable with regex, due to the already complex pattern(s).
' Split str if we encounter a comma and only keep the first part - this way we don't have to solve this case in the regex.
If InStr(str, ",") <> 0 Then
strL = Split(str, ",")
str = strL(0)
End If
' If str starts with a 4-digit group, return an empty string.
If cRegExNum(str) = False Then
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegEx = rMatch(0)
Else
cRegEx = ""
End If
Else
cRegEx = ""
End If
End Function
Function cRegExNum (str As String) As Boolean
' Does the string start with 4 non-whitespaced integers?
' Return true if it does
Dim rExp As Object, rMatch As Object, regP As String
regP = "^\d{4}"
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegExNum = True
Else
cRegExNum = False
End If
End Function

leading space with regex match

A newbie to regex, I'm trying to skip the first set of brackets [word1], and match any remaining text bracketed with the open bracket and closing brace [...}
Text: [word1] This is a [word2]bk{not2} sentence [word3]bk{not3}
Pattern: [^\]]\[.*?\}
So what I want is to match [word2]bk{not2} and [word3]bk{not3}, and it works, kind of, but I'm ending up with a leading space on each of the matches. Been playing with this for a couple of days (and doing a lot of reading), but I'm obviously still missing something.

\[[^} ]*}
Try this.See demo .
https://regex101.com/r/qJ8qW5/1

[^]] in your pattern match leading space. That matches any character without ].
For example, when text is [word1] This is a X[word2]bk{not2},
pattern [^\]]\[.*?\} matches X[word2]bk{not2}.
if any open brackets doesn't appear between [wordN} and {notN}, you can use:
\[[^\[}]*}
Or, you can also use Submatches with capturing groups.
Sub test()
Dim objRE As Object
Dim objMatch As Variant
Dim objMatches As Object
Dim strTest As String
strTest = "[word1] This is a [word2]bk{not2} sentence [word3]bk{not3}"
Set objRE = CreateObject("VBScript.RegExp")
With objRE
.Pattern = "[^\]](\[.*?\})"
.Global = True
End With
Set objMatches = objRE.Execute(strTest)
For Each objMatch In objMatches
Debug.Print objMatch.Submatches(0)
Next
Set objMatch = Nothing
Set objMatches = Nothing
Set objRE = Nothing
End Sub
In this sample code, pattern has Parentheses for grouping.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js