Regex return seven digit number match only - regex

I've been trying to build a regular expression to extract a 7 digit number from a string but having difficulty getting the pattern correct.
Example string - WO1519641 WO1528113TB WO1530212 TB
Example return - 1519641, 1528113, 1530212
My code I'm using in Excel is...
Private Sub Extract7Digits()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A300")
For Each c In Myrange
strPattern = "\D(\d{7})\D"
'strPattern = "(?:\D)(\d{7})(?:\D)"
'strPattern = "(\d{7}(\D\d{7}\D))"
strInput = c.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
For Each Match In matches
s = s & " Word: " & Match.Value & " "
Next
c.Offset(0, 1) = s
Else
s = ""
End If
Next
End Sub
I've tried all 3 patterns in that code but I end up getting a return of O1519641, O1528113T, O1530212 when using "\D(\d{7})\D". As I understand now the () doesn't mean anything because of the way I am storing the matches while I initially thought they meant that the expression would return what was inside the ().
I've been testing things on http://regexr.com/ but I'm still unsure of how to get it to allow the number to be inside the string as WO1528113TB is but only return the numbers. Do I need to run a RegEx on the returned value of the RegEx to exclude the letters the second time around?

I suggest using the following pattern:
strPattern = "(?:^|\D)(\d{7})(?!\d)"
Then, you will be able to access capturing group #1 contents (i.e. the text captured with the (\d{7}) part of the regex) via match.SubMatches(0), and then you may check which value is the largest.
Pattern details:
(?:^|\D) - a non-capturing group (does not create any submatch) matching the start of string (^) or a non-digit (\D)
(\d{7}) - Capturing group 1 matching 7 digits
(?!\d) - a negative lookahead failing the match if there is a digit immediately after the 7 digits.

Related

VBA regex - Value used in formula is of the wrong data type

I can't seem to figure out why this function which includes a regex keeps returning an error of wrong data type? I'm trying to return a match to the identified pattern from a file path string in an excel document. An example of the pattern I'm looking for is "02 Package_2018-1011" from a sample string "H:\H1801100 MLK Middle School Hartford\2-Archive! Issued Bid Packages\01 Package_2018-0905 Demolition and Abatement Bid Set_Drawings - PDF\00 HazMat\HM-1.pdf". Copy of the VBA code is listed below.
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\D{2}\sPackage_\d{4}-\d{4}"
.Global = True
End With
Set textpart = regex.Execute(strInput)
End Function
You need to use \d{2} to match 2-digit chunk, not \D{2}. Besides, you are trying to assign the whole match collection to the function result, while you should extract the first match value and assign that value to the function result:
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Dim matches As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\d{2}\sPackage_\d{4}-\d{4}"
End With
Set matches = regex.Execute(strInput)
If matches.Count > 0 Then
textpart = matches(0).Value
End If
End Function
Note that to match it as a whole word you may add word boundaries:
.Pattern = "\b\d{2}\sPackage_\d{4}-\d{4}\b"
^^ ^^
To only match it after \, you may use a capturing group:
.Pattern = "\\(\d{2}\sPackage_\d{4}-\d{4})\b"
' ...
' and then
' ...
textpart = matches(0).Submatches(0)

Regex - Return Exact Number of Consecutive Digits

I want to return 5 consecutive digits from a string (working in VBA).
Based on this post Regex I'm using the pattern [^\d]\d{5}[^\d], but this picks up the single letters immediately before and after the targeted 5 digits and returns h92345W(from "....South92345West").
How can I modify to return only the 5 consecutive digits: 92345
Sub RegexTest()
Dim strInput As String
Dim strPattern As String
strInput = "9129 Nor22 999123456 South92345West"
'strPattern = "^\d{5}$" 'No match
strPattern = "[^\d]\d{5}[^\d]" 'Returns additional letter before and after digits
'In this case returns: "h12345W"
MsgBox RegxFunc(strInput, strPattern)
End Sub
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = regexPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
([^\d])(\d{5})([^\d])
You can use this regex, the matched terms should be in the 2nd group
You need to use a group:
"[^\d](\d{5})[^\d]"
And then the number will be in the first group. Not sure about the VBA syntax for grouping.

How do I print my extracted pattern in a column using regex.execute and match object in vba?

I'm using vba to write a sub to extract pin codes from given addresses in a column in an excel worksheet. I was able to find the regex pattern to extract the pin pattern but Im unable to output the said extracted pins to a column. As a way to test whether the regex is able to extract the pin pattern from the column (it is) I passed the Match.value property from matches object to a msgbox and was able to get an output for each string in a msgbox.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "\d{6}"
Dim Match As Object
Dim matches As Object
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("B1:B30")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regex.Test(strInput) Then
Set matches = regex.Execute(strInput)
For Each Match In matches
MsgBox (Match.Value) 'A workaround I found to see if my pattern
'worked but I need to print Match.value
'in a column so this wont do
Next
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
How do I extract the pattern string from the match object and print it into a column (like U1:U30) for each cell in my range B1:B30
TL;DR: Regex Pattern working but how to print extracted pattern in cell
How about collecting the matches comma separated in a string strMatches and write that to a cell?
Add this before For Each cell In Myrange
Dim i As Long, strMatches As String
i = 1 'row number where we start to write
And replace your other For Each with
strMatches = vbNullString
For Each Match In matches
strMatches = strMatches & Match.Value & ", " 'collect all matches comma seprated
Next
If Not strMatches = vbNullString Then strMatches = Left(strMatches, Len(strMatches) - 2) 'remove last comma
Worksheets("your-sheet-name").Range("U" & i).Value = strMatches 'write the matches into cell
i = i + 1

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.

How do you return the 2nd through nth match of a vbscript regex capturing group

I have a regular expression that matches a particular text string and then is supposed to return the matches in cells adjacent to the matched string. I'm using a capturing group so there could be more than one match. I'm able to return the first match without any problems, but I can't figure out how to return the second through nth matches.
My code follow:
Sub splitUpComments()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("G2:G5")
For Each c In Myrange
strPattern = "(\S.+?[.?!])(?=\s+|$)"
If strPattern <> "" Then
strInput = c.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
c.Offset(0, 1) = regEx.Replace(strInput, "$1")
c.Offset(0, 2) = regEx.Replace(strInput, "$2")
c.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
'do nothing
End If
End If
Next
End Sub
Given the following Target String:
"This is a test one. This is a test two. This is a test three."
I was hoping to see:
This is a test one. This is a test two. This is a test three.
You can see the regular expression working here at Regex 101: Working Regular Expression
but instead I'm getting:
This is a test one. This is a test two. This is a test three. $2$2$2 $3$3$3
(Where the first cell contains the whole target string and the next two columns contain $2$2$2 and $3$3$3, respectively.)
It look as though (1) the regular expression is not working and (2) that $2 and $3 represent the 2nd and third capture groups, not the 2nd and 3rd instance of the first capture group. Can anyone shed any light on this. Thanks.
I am sorry my vba knowledge is limited, but you can do something along the line of the following
i = 1
set MyMatches = regEx.test(strInput)
for each match in MyMatches
c.Offset(0, i) = match
i++
next
You access matches using the Match Object as alpha bravo indicated in his answer. The can only create a valid match object by using the Execute method of the RexEx object. Execute was missing from my original code. In the code below, I have replaced text with Execute and have also used the .Value method to access the values in the Match object.
Set MyMatches = regEx.Execute(strInput)
match_count = 0
For Each Match In MyMatches
match_count = match_count + 1
c.Offset(0, match_count) = Match.Value
Next
Hope this helps.