Getting "TRUE" when RegEx doesn't match? - regex

I am attempting to try and use Regular Expressions to do a pattern match in VBA. I've added a reference to the Regular Expression libraries and am using the following code as a test.
Sub testing()
Dim re
Dim val
Set re = New RegExp
re.Pattern = "[0-9]{5}"
re.IgnoreCase = True
val = Range("A8").Value
MsgBox val
MsgBox re.Test(val)
End Sub
The issue is that when I'm testing a string formatted as:
1234 565 4444543 12 33
I am receiving "True" when I use {5} and "False" when using {6}. Why is this?
Shouldn't both {5} and {6} return "False" in this case?
If the RegEx is matching on the whitespace, how do I prevent this? I want to match exactly 4 numbers followed by exactly one space followed by exactly 3 numbers, etc.
Help!

You need to anchor your regular expression:
re.Pattern = "^[0-9]{5}$"
Otherwise it matches if it finds the pattern anywhere in the input. ^ matches the beginning of the input, $ matches the end of the input.
I'm not sure why [0-9]{6} returns False with your input, since there are 6 digits in 4444543.

Related

VBA check String is matched exactly word

I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?
It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.
To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.

Get the non-matching part of the pattern through a RegEx

In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.
Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.
As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.

check validation of an expression with Regex

I need in my code to verify the validity of an expression entered in a textbox, so I thought to Regex but my problem is that I do not get it.
So here is my expression: [3 Numbers]-[1 character Shift].[1 Number].
for example: 007-L.4
I try with this:
Dim MyRegex As Regex = New Regex("^[0-9]{3}-[a-zA-Z].[O-9]$")
but it does not work
thank you in advance
You have two errors in your pattern:
^[0-9]{3}-[a-zA-Z].[O-9]$
^ ^
1 2
The . is a metacharacter which matches any character. You need to escape it to \. to match periods only,
Your range is not valid, since you wrote O (the letter) instead of 0 (the digit). :-)
Here's a the corrected pattern:
Dim MyRegex As Regex = New Regex("^[0-9]{3}-[a-zA-Z]\.[0-9]$")
(demo)

Regex lookahead to match everything prior to 1st OR 2nd group of digits

Regex in VBA.
I am using the following regex to match the second occurance of a 4-digit group, or the first group if there is only one group:
\b\d{4}\b(?!.+\b\d{4}\b)
Now I need to do kind of the opposite: I need to match everything up until the second occurance of a 4-digit group, or up until the first group if there is only one. If there are no 4-digit groups, capture the entire string.
This would be sufficient.
But there is also a preferable "bonus" route: If there exists a way to match everything up until a 4-digit group that is optionally followed by some random text, but only if there is no other 4-digit group following it. If there exists a second group of 4 digits, capture everything up until that group (including the first group and periods, but not commas). If there are no groups, capture everything. If the line starts with a 4-digit group, capture nothing.
I understand that also this could (should?) be done with a lookahead, but I am not having any luck in figuring out how they work for this purpose.
Examples:
Input: String.String String 4444
Capture: String.String String 4444
Input: String4444 8888 String
Capture: String4444
Input: String String 444 . B, 8888
Capture: String String 444 . B
Bonus case:
Input: 8888 String
Capture:
for up until the second occurrence of a 4-digit group, or up until the first group if there is only one use this pattern
^((?:.*?\d{4})?.*?)(?=\s*\b\d{4}\b)
Demo
per comment below, use this pattern
^((?:.*?\d{4})?.*?(?=\s*\b\d{4}\b)|.*)
Demo
You can use this regex in VBA to capture lines with 4-digit numbers, or those that do not have 4-digit numbers in them:
^((?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4})|(?!.*[0-9]{4}).*)
See demo, it should work the same in VBA.
The regex consists of 2 alternatives: (?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) and (?!.*[0-9]{4}).*.
(?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) matches 0 or more (as few as possible) characters that are preceded by 0 or 1 sequence of characters followed by a 4-digit number, and are followed by optional space(s) and 4 digit number.
(?!.*[0-9]{4}).* matches any number of any characters that do not have a 4-digit number inside.
Note that to only match whole numbers (not part of other words) you need to add \b around the [0-9]{4} patterns (i.e. \b[0-9]{4}\b).
Matches everything except spaces till last occurace of a 4 digit word
You can use the following:
(?:(?! ).)+(?=.*\b\d{4}\b)
See DEMO
For your basic case (marked by you as sufficient), this will work:
((?:(?!\d{4}).)*(?:\d{4})?(?:(?!\d{4}).)*)(?=\d{4})
You can pad every \d{4} internally with \b if you need to.
See a demo here.
If anyone is interested, I cheated to fully solve my problem.
Building on this answer, which solves the vast majority of my data set, I used program logic to catch some rarely seen use-cases. It seemed difficult to get a single regex to cover all the situations, so this seems like a viable alternative.
Problem is illustrated here.
The code isn't bulletproof yet, but this is the gist:
Function cRegEx (str As String) As String
Dim rExp As Object, rMatch As Object, regP As String, strL() As String
regP = "^((?:.*?[0-9]{4})?.*?(?:(?=\s*[0-9]{4})|(?:(?!\d{4}).)*)|(?!.*[0-9]{4}).*)"
' Encountered two use-cases that weren't easily solvable with regex, due to the already complex pattern(s).
' Split str if we encounter a comma and only keep the first part - this way we don't have to solve this case in the regex.
If InStr(str, ",") <> 0 Then
strL = Split(str, ",")
str = strL(0)
End If
' If str starts with a 4-digit group, return an empty string.
If cRegExNum(str) = False Then
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegEx = rMatch(0)
Else
cRegEx = ""
End If
Else
cRegEx = ""
End If
End Function
Function cRegExNum (str As String) As Boolean
' Does the string start with 4 non-whitespaced integers?
' Return true if it does
Dim rExp As Object, rMatch As Object, regP As String
regP = "^\d{4}"
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegExNum = True
Else
cRegExNum = False
End If
End Function

Regular Expression writing for known pattern with dashes

I need to write a regular expression for pattern matching for VB.NET. I need to have the Regex to look for a pattern like 12345-1234-12345-123, including the dashes. The numbers can be any variation. The value is stored as a varchar. Not sure how close or far my example is below. Any help/guidance is much appreciated.
Protected Sub Button1_Click(sender As Object, e As System.EventArgs) Handles Button1.Click
Dim testString As String = "12345-1234-12345-123"
Dim testNumberWithDashesRegEx As Regex = New Regex("^\d{5}-d{4}-d{5}-\d{3}$")
Dim regExMatch As Match = testNumberWithDashesRegEx.Match(testString)
If regExMatch.Success Then
Label1.Text = "There is a match."
Else
Label1.Text = "There is no match."
End If
End Sub
Let's break down this regex:
^\d{5}-d{4}-d{5}-\d{3}$
^: Match at start of target string
\d: match character class of digits 0-9
-: match dash (-) character
d: match the letter "d"
{5}: match the preceding class 5 times
$: Match at the end of target string.
Everything looks good to me, except you should change your plain "d" to "\d":
^\d{5}-\d{4}-\d{5}-\d{3}$