Regular Expression in excel VBA - regex

I'm trying to use regex in excel VBA to match a pattern within all cells in a column range, and remove the matched patterns to a new column range.
E.g.
Happy Day Care Club (1124734)
French Pattiserie (8985D)
The King's Pantry (G6666642742D)
Big Shoe (China) Ltd (ZZ454)
Essentially I want to remove the last bracketed portion of each string and transpose this part (without the brackets) into a different column range.
The regex I have so far is "(([^)]+))\z" (which I don't know if this is actually correct), and embedded within this VBA:
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Sheets("Sheet 1").Activate
Range("FF65536").End(xlUp).Select
LastCell = ActiveCell.Address
Set Myrange = ActiveSheet.Range("FF2:" & LastCell)
For Each C In Myrange
strPattern = "(\(([^\)]+)\)\z)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Range("FF2").Select = regEx.Replace(strInput, "$1")
Range("DX2").Select = regEx.Replace(strInput, "$2")
End If
End If
Next
I'm a newbie so please forgive glaringly obvious mistakes.
Many thanks,

No your regex pattern isn't correct. You should test your pattern separately as regex is its own mini-language. Try this pattern (Regex101):
\((.+)\)$
About the gm options: g means Global, m means Multiline, both of which are set to True in your code.

Here's a non-RegEx method:
Dim Myrange As Range
Sheets("Sheet 1").Activate
Set Myrange = ActiveSheet.Range("FF2:FF" & Cells(Rows.Count, "FF").End(xlUp).Row)
With Myrange
.Offset(, -43).Value = .Worksheet.Evaluate("INDEX(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(" & .Address & _
",""("",REPT("" "",500)),500)),"")"",""""),)")
End With

Personally I would resort to RegEx as a last resort...
Here is a snippet using string functions:
Dim iRow As Long
Dim s As String
For iRow = 1 To UsedRange.Rows.Count
Debug.Print Cells(iRow, 1).Value
s = Cells(iRow, 1).Value
s = Trim(Left(s, InStrRev(s, "(") - 1))
Debug.Print s
Next
The relevant line being Trim(Left(s, InStrRev(s, "(") - 1)). You would need QA check to deal with data w/o proper format.

Related

Write several regex matches to different cells in excel

I'm writing a excel macro to be able to search an excel list and, among other things, write the matches (if any) to different cells.
I got a lot of help from this great explanation but what I can't figure out is how to only write the regex match to the cell. My current code cuts the string after the match and writes this to the cell. But I would like to only write the match, nothing else from the string.
This is my code:
Private Sub simpleRegex()
Dim strPattern As String
Dim strReplace As String
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
strPattern = "(storlek|strl|stl|strlk|storleken|storl|size|storleksmärkt|storl|storlk|st)(.{0,2}?)((30|32|34|36|38|40|42|44|46|48|50))(.?)((30|32|34|36|38|40|42|44|46|48|50)?)"
If strPattern <> "" Then
strInput = cell.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
cell(1, 5).Value = 1
cell(1, 2).Value = regEx.Replace(strInput, "$1")
cell(1, 3).Value = regEx.Replace(strInput, "$2")
cell(1, 4).Value = regEx.Replace(strInput, "$3")
Else
cell(1, 6).Value = 1
End If
End If
Next
End Sub
This is the result I get in excel:
So the red text in column A is the full match from the initial string and the red in column B and D is the matches separated. So it's almost as I want it, but I would like to only have the match in the cell B-D not the whole string.
Sorry for the swedish in the example, my dataset is from a swedish site. But I think you get the problem anyway?
You need to use .regEx.Execute(str) and access the SubMatches values:
Dim objMatchs As MatchCollection
' ...
Set objMatches = regEx.Execute(strInput)
If objMatches.Count <> 0 Then
cell(1, 5).Value = 1
cell(1, 2).Value = objMatches(0).SubMatches(0)
cell(1, 3).Value = objMatches(0).SubMatches(1)
cell(1, 4).Value = objMatches(0).SubMatches(2)
End If
The capture group IDs start with the 0 based index.
The objMatches(0).SubMatches(0) means get the first match, the first capturing group value.

Regular expression to match year?

I'm new to regular expressions in excel vba, been looking at a few questions about it on stack overflow, found a great one at the following link "How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops"
There was some very useful code here that I thought I might try to learn and adapt for my purposes, I'm trying to match a 4 digit string representing a year from a cell on a spreadsheet ie. "2016 was a good year" would yield "2016".
I used some slightly altered code from that question posted there and it manages to recognize that a string contains a year, however I'm not sure how to separate and extract the string from the rest of the cell contents, ie. getting 2016 on it's own in an adjacent cell, any changes I should make?
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("D2:D244")
For Each c In Myrange
strPattern = "([0-9]{4})" 'looks for (4 consecutive numbers)
If strPattern <> "" Then
strInput = c.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
c.Offset(0, 5) = regEx.Replace(strInput, "$1") 'puts the string in an adjacent cell
Else
c.Offset(0, 5) = "(Not matched)"
End If
End If
Next
End Sub
You could significantly improve your code as below:
Use variant arrays rather than a range
Move the RegExp out of the loop (you are setting it the same way for each cell)
Your RegExp parameters can be reduced for what you want (minor).
Private Sub splitUpRegexPattern()
Dim regEx As Object
Dim strPattern As String
Dim strInput As String
Dim X
Dim Y
Dim lngCnt As Long
Set regEx = CreateObject("vbscript.regexp")
X = ActiveSheet.Range("D2:D244").Value2
Y = X
strPattern = "\b[0-9]{4}\b" 'looks for (4 consecutive numbers)
With regEx
.MultiLine = True
.Pattern = strPattern
For lngCnt = 1 To UBound(X)
If .Test(X(lngCnt, 1)) Then
Y(lngCnt, 1) = .Execute(X(lngCnt, 1))(0)
Else
Y(lngCnt, 1) = "(Not matched)"
End If
Next
Range("D2:D244").Offset(0, 5).Value2 = Y
End With
End Sub
user1016274, thanks, your comment really helped, had to do some searching on it, but I found the answer
using regEx.Execute(strInput) I managed to return the string matched:
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("D2:D244")
For Each c In Myrange
strPattern = "([0-9]{4})" 'looks for (4 consecutive numbers)
If strPattern <> "" Then
strInput = c.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
c.Offset(0, 5) = regEx.Execute(strInput).Item(0).SubMatches.Item(0) 'this was the part I changed
Else
c.Offset(0, 5) = "(Not matched)"
End If
End If
Next
End Sub

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.

Excel Regular Expression: Add Quote (") to Values in Two Columns

So I have a CSV file with two columns that have items listed out like below:
The goal is to create a Excel VB code that will go through columns H and I, and add a quote (") to the beginning and end of each 6 digit group (e.g., H67100 into "H67100"). Additionally, the comma should be left alone.
I know the code is not complete as of yet, but this is what I have thus far. I think I am fine with the beginning part but after the match is found, I think my logic/syntax is incorrect. A little guidance and feedback is much appreciated:
Private Sub splitUpRegexPattern2()
Dim strPattern As String: strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("H:I")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
End Sub
UPDATED CODE:
Function splitUpRegexPattern2 (Myrange As Range) as String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Dim strReplace As String
Dim strOutput As String
strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End FUNCTION
Adding example CSV file. Download Sample CSV File
This answer assumes you can get the values of each cell you are interested in.
There's no need to use RegEx in this case as your values appear to be simple comma-delimited data.
Public Const DOUBLE_QUOTE As String = Chr(34)
'''
'''<summary>This function splits a string into an array on commas, adds quotes around each element in the array, the joins the array back into a string placing a comma between each element.</summary>
'''
Public Function QuotedValues(ByVal input As String) As String
Dim words As String() = input.Split(New Char() {","})
Dim result As String = String.Empty
words = (From w In words Select DOUBLE_QUOTE & w.Trim & DOUBLE_QUOTE).ToArray
result = String.Join(", ", words)
Return result
End Function

Excel RegEx Extraction

recently I've been trying to extract some strings from text in excel. I used script from other post here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Since Macro code is working fine I couldn't use in Cell function, it's showing #NAME? error. I've included "Microsoft VBScript Regular Expressions 5.5" but still no luck.
I can use it with macro but script needs some changes. I would like to have some strings in A1:A50, then to B1:B50 extract date in format DD Month YYYY (e.g. 28 July 2014) and to C1:C50 extract account no in format G1234567Y.
For now script is replacing date with "". Regular Expression for date is correct but how to insert date into B column? And then A/c no to C column working on 1:50 range?
This is the code:
Sub simpleRegex()
Dim strPattern As String: strPattern = "[0-9][0-9].*[0-9][0-9][0-9][0-9]"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Dim Out As Range
Set Myrange = ActiveSheet.Range("A1")
Set Out = ActiveSheet.Range("B1")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Out = regEx.Replace(strInput, strReplace)
Else
MsgBox ("Not matched")
End If
End If
End Sub
Thank You kindly for any assistance.
Currently your replacing the matching string with an empty string "" so that's why your getting no result. You need to return the actual match using () to indicate match set and $1 to retrieve it.
Based on your example, I'll assume your text in column A looks like this: 28 July 2014 G1234567Y
Here is a routine that will split apart the text into a date and then the text following the date.
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A50")
For Each C In Myrange
strPattern = "([0-9]{1,2}.*[0-9]{4}) (.*)"
'strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Result:
To use an in-cell function, set it up to extract a single piece such as Date or everything else. The following code will extract the date. Cell B1 would have the following equation: =extractDate(A1)
Function extractDate(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strRaplace As String
Dim strOutput As String
strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
extractDate = regEx.Replace(strInput, "$1")
Else
extractDate = "Not matched"
End If
End If
End Function
To make another function for extracting the rest of the date simply change $1 to $2 and it will return the second defined match in the pattern.