How to exclude an amount? - regex

I have two strings with the same amount:
Price $22.00
Price Max=$22.00
Can someone please advise how I can modify this regex pattern to make sure that the price with a "Max" in front of it will be ignored?
(?:MAX=|MAX=\s)[$]?[0-9]{0,2}?[,]?[0-9]{1,3}[.][0-9]{0,2}

You may capture the MAX= into an optional capturing group and check if it matched when all matches are found. Only grab the value if the Group 1 did not match:
Dim strPattern As String: strPattern = "(MAX=\s*)?\$\d[\d.,]*"
Dim regEx As Object
Dim ms As Object, m As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.Pattern = strPattern
Dim t As String
t = "Price $24.00 Price Max=$22.00 "
Set ms = regEx.Execute(t)
For Each m In ms
If Len(m.SubMatches(0)) = 0 Then
Debug.Print m.value
End If
Next
The (MAX=\s*)?\$\d[\d.,]* pattern matches MAX= and 0+ whitespaces into an optional group, it matches 1 or 0 times. \$\d[\d.,]* will match a digit and any 0+ digits, commas and dots. If Len(m.SubMatches(0)) = 0 Then will check if Group 1 is not empty, and if yes, the match is valid.

One way to do it could be to match what you don't want and to capture what you do want in a capturing group using an alternation:
Max=\s*\$[0-9]+\.[0-9]+|(\$[0-9]+\.[0-9]+)

Related

RegEx to extract a word from mail's body

I need to extract a word from incoming mail's body.
I used a Regex after referring to sites but it is not giving any result nor it is throwing an error.
Example: Description: sample text
I want only the first word after the colon.
Dim reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim EAI As String
Set reg1 = New RegExp
With reg1
.Pattern = "Description\s*[:]+\s*(\w*)\s*"
.Global = False
End With
If reg1.Test(Item.Body) Then
Set M1 = reg1.Execute(Item.Body)
For Each M In M1
EAI = M.SubMatches(1)
Next
End If
Note that your pattern works well though it is better written as:
Description\s*:+\s*(\w+)
And it will match Description, then 0+ whitespaces, 1+ : symbols, again 0 or more whitespaces and then will capture into Group 1 one or more word characters (as letters, digits or _ symbols).
Now, the Capture Group 1 value is stored in M.SubMatches(0). Besides, you need not run .Test() because if there are no matches, you do not need to iterate over them. You actually want to get a single match.
Thus, just use
Set M1 = reg1.Execute(Item.body)
If M1.Count > 0 Then
EAI = M1(0).SubMatches(0)
End If
Where M1(0) is the first match and .SubMatches(0) is the text residing in the first group.

Regex Visual Basic UDF not executing as expected

I am busy with a regular expression for VB and I cant seem to find where I am going wrong here.
Example:
Pattern:(?<=\d{10,11})(.|[\r\n])*(?=Mobile)
Input: 6578543567 Text I want to retain Mobile Operation
Output: #Name?
List item
The number consists of 10 and 11 digit telephone numbers.
The text I want to retain varies in length.
The text always precedes the word Mobile.
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
IIRC, VBA doesn't support lookarounds in it's Regular Expression implementation.
But, this appears to be a relatively easy string to match. You have a group of consecutive numbers, followed by a space, and then you want to match an undisclosed amount of words up to the word "Mobile".
You could use the following pattern to accomplish this:
\d+\s(.*?)\sMobile
Details (See it in action here):
\d any digit
+ (Quantifier) One to unlimited times - greedy
\s a single whitespace character
(...) capturing group to grab the text you want to return
. any character
*? (Quantifier) Zero to unlimited times - lazy
\s a single whitespace character
Mobile literally matches the word Mobile
What's with the greedy vs lazy quantifiers?
The first quantifier + is Greedy. What makes this greedy? The lack of the ? immediately following this quantifier makes it greedy. What this essentially does is it will consume as much ass it possibly can of the \d.
Since we added a \s to the end of that statement, this won't really change the outcome because it will have to match all the digits anyway to get to that space \s. However, if you decided you wanted to capture (...) the space and you removed the \s, then this would be important - because your .*? will consume all but one of your numbers \d if this was lazy.
So, then why are we using a lazy quantifier with .*?? Well, if your input string contained two words that said Mobile, a greedy quantifier would consumer the first word and match up up to the second. If you only want to match up to the first word of Mobile, then you want to make it lazy.
So Finally - Now how do I retrieve the text in my capturing group (...)?
With VBA, you would use the Matches object. First I would recommend testing to ensure that there is a match - this can be done in a simple If...Then statement. Once this test passes, you can then safely obtain your return value.
With New RegExp
.Pattern = "\d+\s(.*?)\sMobile"
.IngoreCase = True 'If your 'Mobile' word can be any case, switch to false
If .Test(inputString) Then
retVal = .Execute(inputString)(0).SubMatches(0)
End If
End With
inputString would be the string that contains the test values.
retVal would be what is returned from your capturing group.

UDF Regex - yyyy only

I am just learning some regex, and I need help spitting out matches generated by my regex code. I found some very useful resources here to output anything not matched, but I want to output only the parts of a cell that do match. I am looking for dates in cells, that may be a single yyyy date or yyyy-yy, or the like (as shown from the sample data below).
Sample data:
1951/52
1909-13
2005-2014
7 . (1989)-
1 (1933/34)-2 (1935/36)
1979-2012/2013
Current Function Code: (A snippet found from an existing post here, but returns the replacement value instead of what was matched)
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([12][0-9]{3}[/][0-9]{2,4})|([12][0-9]{3}[-][0-9]{2,4})|([12][0-9]{3})"
You may use
\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b
See the regex demo
Note that \b might be removed if you are not interested in a whole word search.
Pattern details:
\b - leading word boundary (the preceding char must be either a non-word char or the start of string)
[12][0-9]{3} - 1 or 2 followed with any 3 digits
(?:[,/-][0-9]{2,4})* - zero or more sequences ((?:...)*) of:
[,/-] - a ,, / or - characters
[0-9]{2,4} - any 2 to 4 digits
\b - trailing word boundary (there must be a non-word char or the end of string after).
Sample VBA code to grab all those values using RegExp#Execute:
Sub FetchDateLikeStrs()
Dim cellContents As String
Dim reg As regexp
Dim mc As MatchCollection
Dim m As match
Set reg = New regexp
reg.pattern = "\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b"
reg.Global = True
cellContents = "1951/52 1909-13 2005-2014 7 . (1989)- 1 (1933/34)-2 (1935/36) 1979-2012/2013 1951,52"
Set mc = reg.Execute(cellContents)
For Each m In mc
Debug.Print m.Value
Next
End Sub

Regex lookahead to match everything prior to 1st OR 2nd group of digits

Regex in VBA.
I am using the following regex to match the second occurance of a 4-digit group, or the first group if there is only one group:
\b\d{4}\b(?!.+\b\d{4}\b)
Now I need to do kind of the opposite: I need to match everything up until the second occurance of a 4-digit group, or up until the first group if there is only one. If there are no 4-digit groups, capture the entire string.
This would be sufficient.
But there is also a preferable "bonus" route: If there exists a way to match everything up until a 4-digit group that is optionally followed by some random text, but only if there is no other 4-digit group following it. If there exists a second group of 4 digits, capture everything up until that group (including the first group and periods, but not commas). If there are no groups, capture everything. If the line starts with a 4-digit group, capture nothing.
I understand that also this could (should?) be done with a lookahead, but I am not having any luck in figuring out how they work for this purpose.
Examples:
Input: String.String String 4444
Capture: String.String String 4444
Input: String4444 8888 String
Capture: String4444
Input: String String 444 . B, 8888
Capture: String String 444 . B
Bonus case:
Input: 8888 String
Capture:
for up until the second occurrence of a 4-digit group, or up until the first group if there is only one use this pattern
^((?:.*?\d{4})?.*?)(?=\s*\b\d{4}\b)
Demo
per comment below, use this pattern
^((?:.*?\d{4})?.*?(?=\s*\b\d{4}\b)|.*)
Demo
You can use this regex in VBA to capture lines with 4-digit numbers, or those that do not have 4-digit numbers in them:
^((?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4})|(?!.*[0-9]{4}).*)
See demo, it should work the same in VBA.
The regex consists of 2 alternatives: (?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) and (?!.*[0-9]{4}).*.
(?:.*?[0-9]{4})?.*?(?=\s*?[0-9]{4}) matches 0 or more (as few as possible) characters that are preceded by 0 or 1 sequence of characters followed by a 4-digit number, and are followed by optional space(s) and 4 digit number.
(?!.*[0-9]{4}).* matches any number of any characters that do not have a 4-digit number inside.
Note that to only match whole numbers (not part of other words) you need to add \b around the [0-9]{4} patterns (i.e. \b[0-9]{4}\b).
Matches everything except spaces till last occurace of a 4 digit word
You can use the following:
(?:(?! ).)+(?=.*\b\d{4}\b)
See DEMO
For your basic case (marked by you as sufficient), this will work:
((?:(?!\d{4}).)*(?:\d{4})?(?:(?!\d{4}).)*)(?=\d{4})
You can pad every \d{4} internally with \b if you need to.
See a demo here.
If anyone is interested, I cheated to fully solve my problem.
Building on this answer, which solves the vast majority of my data set, I used program logic to catch some rarely seen use-cases. It seemed difficult to get a single regex to cover all the situations, so this seems like a viable alternative.
Problem is illustrated here.
The code isn't bulletproof yet, but this is the gist:
Function cRegEx (str As String) As String
Dim rExp As Object, rMatch As Object, regP As String, strL() As String
regP = "^((?:.*?[0-9]{4})?.*?(?:(?=\s*[0-9]{4})|(?:(?!\d{4}).)*)|(?!.*[0-9]{4}).*)"
' Encountered two use-cases that weren't easily solvable with regex, due to the already complex pattern(s).
' Split str if we encounter a comma and only keep the first part - this way we don't have to solve this case in the regex.
If InStr(str, ",") <> 0 Then
strL = Split(str, ",")
str = strL(0)
End If
' If str starts with a 4-digit group, return an empty string.
If cRegExNum(str) = False Then
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegEx = rMatch(0)
Else
cRegEx = ""
End If
Else
cRegEx = ""
End If
End Function
Function cRegExNum (str As String) As Boolean
' Does the string start with 4 non-whitespaced integers?
' Return true if it does
Dim rExp As Object, rMatch As Object, regP As String
regP = "^\d{4}"
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
cRegExNum = True
Else
cRegExNum = False
End If
End Function

VBScript - RegEx - modify ObjMatch - Pattern = "(\d{2}) (\d{2}) (\d{2})"

This is something I stumbled across while trying to learn a little about Reg Ex.
Set objRegEx = CreateObject("VBScript.RegExp")
Dim re, targetString, colMatch, objMatch
Set re = New RegExp
With re
.Pattern = "(\d{2}) (\d{2}) (\d{2}) 0500Z"
.Global = True
.IgnoreCase = True
End With
targetString = "02 04 14 0500Z Joe is eating a sandwich"
Set colMatch = re.Execute(targetString)
For each objMatch in colMatch
WScript.echo objMatch
date1 = objRegEx.Replace(objMatch, "(\d{2})(\d{2})(\d{2})")
Wscript.Echo date1
ISSUE: I need to find the date which shows up like this "02 04 14 0500Z" and then assign it to a variable in the form "020414".
When I try to replace the Obj match and reformat the date it doesn't work, instead showing the exact text in brackets.
I referenced:
http://www.mikesdotnetting.com/Article/24/Regular-Expressions-and-VBScript
http://wiki.mcneel.com/developer/scriptsamples/regexpobject
To refer to content captured by capturing group, use $n in the replacement string (where n is a number):
date1 = re.Replace(objMatch, "$1$2$3")
To identify the number of a capturing group, count the number of opening parentheses ( that belongs to a capturing group up to the capturing group you want to refer to:
(\d{2}) (\d{2}) (\d{2}) 0500Z
^ ^ ^
1 2 3
A more complicated example:
((a(?:k)*)(b(c)(?:d)*))
^^ ^ ^
12 3 4
(?:pattern) is a non-capturing group, so it doesn't count.