Excel VBA RegEx Replace Function Substituting a Literal $1 [duplicate] - regex

This question already has answers here:
How to change case of matching letter with a VBA regex Replace?
(2 answers)
Closed 2 years ago.
I often rely on the blunt ease of the Replace function in VBA to do simple string replacements, but I have long been attracted to the magical allure of regular expressions to perform more sophisticated string manipulations. But in my experimenting, I am simply stuck that my replacement value, "$1", is being returned as a literal part of the output string instead of as the text matched by the RegEx pattern. I assume whatever I am doing wrong is something ugly simple, but I can't see it. Can anyone provide some guidance?
I have included the Microsoft VBScript Regular Expressions 5.5 library as a reference in my VBA project. Here is a simplified snippet of my code:
Dim regEx As RegExp
Dim strInput As String
Dim strPattern As String
Dim strReplace As String ' I've tried type Variant also
strPattern = "/[a-z]" ' Find strings with a forward slash followed by a lowercase letter; this works
strReplace = "$1" ' I've also tried using this value directly in the Replace function without first assigning it to a string value.
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
strInput = regEx.Replace(strInput, strPattern)
End If
If my input string is something like, "High/low Value", the result will be "High$1ow Value" when what I'm after is "High/Low Value". I'm stumped. Any thoughts?

Use "$1" if you are using a capture group in your pattern, which you are not.
This should work given the info provided, and will convert more than one instance of the pattern being matched.
Sub x()
Dim regEx As RegExp
Dim strInput As String
Dim strPattern As String
Dim strReplace As String ' I've tried type Variant also
Dim i As Long, f, s As String
Set regEx = New RegExp
strPattern = "/[a-z]" ' Find strings with a forward slash followed by a lowercase letter; this works
strInput = "High/low and Low/high"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
If .Test(strInput) Then
s = strInput
For i = 0 To .Execute(strInput).Count - 1
f = .Execute(strInput)(i).FirstIndex
s = Left(s, f) & UCase(.Execute(strInput)(i)) & Right(s, Len(s) - f - 2)
Next i
strInput = s
MsgBox strInput
End If
End With
End Sub

Related

How do could I use VBA RegExp (or other function) in Excel to escape special characters?

I need to be able to look through several ranges and add a "\" before any of these characters #$%&_{}~\^ that are not already preceeded by a "\". Can anybody help? I've been fooling around with this for a couple of hours and I know it should be simple to somebody who is more familiar with regular expressions. If there is another simple way, that's fine too.
Below is my lame attempt at making this work. I think that I need to include back references, but that is confounding me.
Function EC(code As Range)
Dim strPattern As String: strPattern = "[^\\]+[#$%^&_{}\~]"
Dim strReplace As String: strReplace = "[\]+[#$%^&_{}\~]"
Dim myreplace As Long
Dim strInput As String
Dim Myrange As Range
Set RegEx = CreateObject("VBScript.RegExp")
For Each cell In code
If strPattern <> "" Then
strInput = cell.Value
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If RegEx.Test(strInput) Then
EC = (RegEx.Replace(strInput, strReplace))
Else: EC = code.Value
End If
End If
Next
Set RegEx = Nothing
End Function

VBA regex - Value used in formula is of the wrong data type

I can't seem to figure out why this function which includes a regex keeps returning an error of wrong data type? I'm trying to return a match to the identified pattern from a file path string in an excel document. An example of the pattern I'm looking for is "02 Package_2018-1011" from a sample string "H:\H1801100 MLK Middle School Hartford\2-Archive! Issued Bid Packages\01 Package_2018-0905 Demolition and Abatement Bid Set_Drawings - PDF\00 HazMat\HM-1.pdf". Copy of the VBA code is listed below.
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\D{2}\sPackage_\d{4}-\d{4}"
.Global = True
End With
Set textpart = regex.Execute(strInput)
End Function
You need to use \d{2} to match 2-digit chunk, not \D{2}. Besides, you are trying to assign the whole match collection to the function result, while you should extract the first match value and assign that value to the function result:
Function textpart(Myrange As Range) As Variant
Dim strInput As String
Dim regex As Object
Dim matches As Object
Set regex = CreateObject("VBScript.RegExp")
strInput = Myrange.Value
With regex
.Pattern = "\d{2}\sPackage_\d{4}-\d{4}"
End With
Set matches = regex.Execute(strInput)
If matches.Count > 0 Then
textpart = matches(0).Value
End If
End Function
Note that to match it as a whole word you may add word boundaries:
.Pattern = "\b\d{2}\sPackage_\d{4}-\d{4}\b"
^^ ^^
To only match it after \, you may use a capturing group:
.Pattern = "\\(\d{2}\sPackage_\d{4}-\d{4})\b"
' ...
' and then
' ...
textpart = matches(0).Submatches(0)

Excel VBA Regex replace loses one character

The below code matches and replaces, but the digit next to the capture group is consumed. Where am I going wrong?
Sub test()
Dim regex As Object 'Regexp object.
Set regex = CreateObject("VBScript.RegExp") 'Regexp object.
Dim strPattern As String: strPattern = "\d(AM|PM)" 'Declare regex pattern.
Dim strReplace As String 'Placeholder string for replace operation.
Dim target As String
target = "1:05PM"
strReplace = " $1"
With regex
.Global = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regex.test(target) Then
Debug.Print regex.Replace(target, strReplace)
End If
End Sub
Output:
1:0 PM
It's because you have an un-captured \d in your regex. Try putting () around the \d i.e. (\d)(AM|PM).
You also need to change strReplace to "$1 $2"

Visual Basic Excel Regular Expression {}

I have some trouble with {}. When i get max value like this {1,8} it not work and i don't now why. Min vale is valid well
Private Sub Highlvl_Expression()
Dim strPattern As String: strPattern = "[a-zA-Z0-9_]{1,8}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim Test As Boolean
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Test = regEx.Test(Highlvl.Value)
If regEx.Test(Highlvl.Value) Then
MsgBox ("Validate")
Else
MsgBox ("Not Validate")
End If
End Sub
You specified the pattern that looks for 1 to 8 alphanumeric characters inside a string. If you run the regex against a 9-character string "ABCDE6789" (regEx.Execute("ABCDE6789")), you will have 2 matches: ABCDE678 and 9.
If you want to validate a string that should have a minimum or a maximum number of characters, you need to use anchors, i.e. start and end of string assertions ^ and $. So, use
Dim strPattern As String: strPattern = "^[a-zA-Z0-9_]{1,8}$"
And
.Global = False
The global flag is not necessary since we are not looking for multiple matches, but for a single true or false result with test.

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.