Excel VBA Regex replace loses one character - regex

The below code matches and replaces, but the digit next to the capture group is consumed. Where am I going wrong?
Sub test()
Dim regex As Object 'Regexp object.
Set regex = CreateObject("VBScript.RegExp") 'Regexp object.
Dim strPattern As String: strPattern = "\d(AM|PM)" 'Declare regex pattern.
Dim strReplace As String 'Placeholder string for replace operation.
Dim target As String
target = "1:05PM"
strReplace = " $1"
With regex
.Global = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regex.test(target) Then
Debug.Print regex.Replace(target, strReplace)
End If
End Sub
Output:
1:0 PM

It's because you have an un-captured \d in your regex. Try putting () around the \d i.e. (\d)(AM|PM).
You also need to change strReplace to "$1 $2"

Related

Use Wildcard (*) with RegEx in VBA to match anything

I'm trying to use Regex to match any character (This is just a piece of code from a larger project). I got the below to work, but seems like it is wrong, is there a proper way to search for any character via RegEx?
strPattern = "([!##$%^&*()]?[a-z]?[0-9]?)"
Eg: MCVE
Public Sub RegExSearch()
Dim regexp As Object
Dim rng As Range, rcell As Range
Dim strInput As String, strPattern As String
Set regexp = CreateObject("vbscript.regexp")
Set rng = ActiveSheet.Range("A1:A1")
With regexp
.Global = False
.MultiLine = False
.ignoreCase = True
.Pattern = strPattern
End With
For Each rcell In rng.Cells
strPattern = "([!##$%^&*()]?[a-z]?[0-9]?)" ' This matches everything, but seems improper
If strPattern <> "" Then
strInput = rcell.Value
If regexp.test(strInput) Then
MsgBox rcell & " Matched in Cell" & rcell.Address
End If
End If
Next
End Sub
. "Wildcard." The unescaped period matches any character, except a new line.
strPattern = "."
Or as #RonRosenfeld pointed out, if you need to match everything INCLUDING a "new line" then this would work.
strPattern = "[/S/s]*"
https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/

VBA regex - Value used in formula is of the wrong data type

I can't seem to figure out why this function which includes a regex keeps returning an error of wrong data type? I'm trying to return a match to the identified pattern from a file path string in an excel document. An example of the pattern I'm looking for is "02 Package_2018-1011" from a sample string "H:\H1801100 MLK Middle School Hartford\2-Archive! Issued Bid Packages\01 Package_2018-0905 Demolition and Abatement Bid Set_Drawings - PDF\00 HazMat\HM-1.pdf". Copy of the VBA code is listed below.
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\D{2}\sPackage_\d{4}-\d{4}"
.Global = True
End With
Set textpart = regex.Execute(strInput)
End Function
You need to use \d{2} to match 2-digit chunk, not \D{2}. Besides, you are trying to assign the whole match collection to the function result, while you should extract the first match value and assign that value to the function result:
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Dim matches As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\d{2}\sPackage_\d{4}-\d{4}"
End With
Set matches = regex.Execute(strInput)
If matches.Count > 0 Then
textpart = matches(0).Value
End If
End Function
Note that to match it as a whole word you may add word boundaries:
.Pattern = "\b\d{2}\sPackage_\d{4}-\d{4}\b"
^^ ^^
To only match it after \, you may use a capturing group:
.Pattern = "\\(\d{2}\sPackage_\d{4}-\d{4})\b"
' ...
' and then
' ...
textpart = matches(0).Submatches(0)

Visual Basic Excel Regular Expression {}

I have some trouble with {}. When i get max value like this {1,8} it not work and i don't now why. Min vale is valid well
Private Sub Highlvl_Expression()
Dim strPattern As String: strPattern = "[a-zA-Z0-9_]{1,8}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim Test As Boolean
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Test = regEx.Test(Highlvl.Value)
If regEx.Test(Highlvl.Value) Then
MsgBox ("Validate")
Else
MsgBox ("Not Validate")
End If
End Sub
You specified the pattern that looks for 1 to 8 alphanumeric characters inside a string. If you run the regex against a 9-character string "ABCDE6789" (regEx.Execute("ABCDE6789")), you will have 2 matches: ABCDE678 and 9.
If you want to validate a string that should have a minimum or a maximum number of characters, you need to use anchors, i.e. start and end of string assertions ^ and $. So, use
Dim strPattern As String: strPattern = "^[a-zA-Z0-9_]{1,8}$"
And
.Global = False
The global flag is not necessary since we are not looking for multiple matches, but for a single true or false result with test.

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.

Excel RegEx Extraction

recently I've been trying to extract some strings from text in excel. I used script from other post here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Since Macro code is working fine I couldn't use in Cell function, it's showing #NAME? error. I've included "Microsoft VBScript Regular Expressions 5.5" but still no luck.
I can use it with macro but script needs some changes. I would like to have some strings in A1:A50, then to B1:B50 extract date in format DD Month YYYY (e.g. 28 July 2014) and to C1:C50 extract account no in format G1234567Y.
For now script is replacing date with "". Regular Expression for date is correct but how to insert date into B column? And then A/c no to C column working on 1:50 range?
This is the code:
Sub simpleRegex()
Dim strPattern As String: strPattern = "[0-9][0-9].*[0-9][0-9][0-9][0-9]"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Dim Out As Range
Set Myrange = ActiveSheet.Range("A1")
Set Out = ActiveSheet.Range("B1")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Out = regEx.Replace(strInput, strReplace)
Else
MsgBox ("Not matched")
End If
End If
End Sub
Thank You kindly for any assistance.
Currently your replacing the matching string with an empty string "" so that's why your getting no result. You need to return the actual match using () to indicate match set and $1 to retrieve it.
Based on your example, I'll assume your text in column A looks like this: 28 July 2014 G1234567Y
Here is a routine that will split apart the text into a date and then the text following the date.
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A50")
For Each C In Myrange
strPattern = "([0-9]{1,2}.*[0-9]{4}) (.*)"
'strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Result:
To use an in-cell function, set it up to extract a single piece such as Date or everything else. The following code will extract the date. Cell B1 would have the following equation: =extractDate(A1)
Function extractDate(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strRaplace As String
Dim strOutput As String
strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
extractDate = regEx.Replace(strInput, "$1")
Else
extractDate = "Not matched"
End If
End If
End Function
To make another function for extracting the rest of the date simply change $1 to $2 and it will return the second defined match in the pattern.