Related
How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?
In-cell function to return a matched pattern or replaced value in a string.
Sub to loop through a column of data and extract matches to adjacent cells.
What setup is necessary?
What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.
Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
Select "Developer" tab (I don't have this tab what do I do?)
Select "Visual Basic" icon from 'Code' ribbon section
In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
Select "References"
Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
Click "OK"
Step 2: Define your pattern
Basic definitions:
- Range.
E.g. a-z matches an lower case letters from a to z
E.g. 0-5 matches any number from 0 to 5
[] Match exactly one of the objects inside these brackets.
E.g. [a] matches the letter a
E.g. [abc] matches a single letter which can be a, b or c
E.g. [a-z] matches any single lower case letter of the alphabet.
() Groups different matches for return purposes. See examples below.
{} Multiplier for repeated copies of pattern defined before it.
E.g. [a]{2} matches two consecutive lower case letter a: aa
E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa
+ Match at least one, or more, of the pattern defined before it.
E.g. a+ will match consecutive a's a, aa, aaa, and so on
? Match zero or one of the pattern defined before it.
E.g. Pattern may or may not be present but can only be matched one time.
E.g. [a-z]? matches empty string or any single lower case letter.
* Match zero or more of the pattern defined before it.
E.g. Wildcard for pattern that may or may not be present.
E.g. [a-z]* matches empty string or string of lower case letters.
. Matches any character except newline \n
E.g. a. Matches a two character string starting with a and ending with anything except \n
| OR operator
E.g. a|b means either a or b can be matched.
E.g. red|white|orange matches exactly one of the colors.
^ NOT operator
E.g. [^0-9] character can not contain a number
E.g. [^aA] character can not be lower case a or upper case A
\ Escapes special character that follows (overrides above behavior)
E.g. \., \\, \(, \?, \$, \^
Anchoring Patterns:
^ Match must occur at start of string
E.g. ^a First character must be lower case letter a
E.g. ^[0-9] First character must be a number.
$ Match must occur at end of string
E.g. a$ Last character must be lower case letter a
Precedence table:
Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |
Predefined Character Abbreviations:
abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line
Example 1: Run as macro
The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".
Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Example 4: Splitting apart different patterns
This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Results:
Additional Pattern Examples
String Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
</i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit
To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.
How it works
It takes 2-3 parameters.
A text to use the regular expression on.
A regular expression.
A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.
Some examples
Extracting an email address:
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+")
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+", "$0")
Results in: some#email.com
Extracting several substrings:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")
Results in: E-Mail: some#email.com, Name: Peter Gordon
To take apart a combined string in a single cell into its components in multiple cells:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)
Results in: Peter Gordon some#email.com ...
How to use
To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):
In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
Click on Tools -> References (please excuse the german screenshot)
Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
Click OK.
Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).
In the big text window in the middle insert the following:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
Save and close the Microsoft Visual Basic for Applications Editor window.
Expanding on patszim's answer for those in a rush.
Open Excel workbook.
Alt+F11 to open VBA/Macros window.
Add reference to regex under Tools then References
and selecting Microsoft VBScript Regular Expression 5.5
Insert a new module (code needs to reside in the module otherwise it doesn't work).
In the newly inserted module,
add the following code:
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = regexPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
The regex pattern is placed in one of the cells and absolute referencing is used on it.
Function will be tied to workbook that its created in.
If there's a need for it to be used in different workbooks, store the function in Personal.XLSB
Here is my attempt:
Function RegParse(ByVal pattern As String, ByVal html As String)
Dim regex As RegExp
Set regex = New RegExp
With regex
.IgnoreCase = True 'ignoring cases while regex engine performs the search.
.pattern = pattern 'declaring regex pattern.
.Global = False 'restricting regex to find only first match.
If .Test(html) Then 'Testing if the pattern matches or not
mStr = .Execute(html)(0) '.Execute(html)(0) will provide the String which matches with Regex
RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
Else
RegParse = "#N/A"
End If
End With
End Function
This isn't a direct answer but may provide a more efficient alternative for your consideration. Which is that Google Sheets has several built in Regex Functions these can be very convenient and help circumvent some of the technical procedures in Excel. Obviously there are some advantages to using Excel on your PC but for the large majority of users Google Sheets will offer an identical experience and may offer some benefits in portability and sharing of documents.
They offer
REGEXEXTRACT: Extracts matching substrings according to a regular expression.
REGEXREPLACE: Replaces part of a text string with a different text string using regular expressions.
SUBSTITUTE: Replaces existing text with new text in a string.
REPLACE: Replaces part of a text string with a different text string.
You can type these directly into a cell like so and will produce whatever you'd like
=REGEXMATCH(A2, "[0-9]+")
They also work quite well in combinations with other functions such as IF statements like so:
=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")
Hopefully this provides a simple workaround for those users who feel daunted by the VBS component of Excel.
To add to the valuable content, I would like to create this reminder on why sometimes RegEx within VBA is not ideal. Not all expressions are supported, but instead may throw an Error 5017 and may leave the author guessing (which I am a victim of myself).
Whilst we can find some sources on what is supported, it would be helpfull to know which metacharacters etc. are not supported. A more in-depth explaination can be found here. Mentioned in this source:
"Although "VBScript’s regular expression ... version 5.5 implements quite a few essential regex features that were missing in previous versions of VBScript. ... JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:"
So, not supported are:
Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string
End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string
Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported)
Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported)
Atomic Grouping
Possessive Quantifiers
Unicode e.g.: \{uFFFF}
Named Capturing Groups. Alternatively use Numbered Capturing Groups
Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available.
Conditionals
Regular Expression Comments. Add these with regular ' comments in script
I already hit a wall more than once using regular expressions within VBA. Usually with LookBehind but sometimes I even forget the modifiers. I have not experienced all these above mentioned backdrops myself but thought I would try to be extensive referring to some more in-depth information. Feel free to comment/correct/add. Big shout out to regular-expressions.info for a wealth of information.
P.S. You have mentioned regular VBA methods and functions, and I can confirm they (at least to myself) have been helpful in their own ways where RegEx would fail.
I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:
Make sure you are in a Macro Enabled Excel File (save as xlsm).
Open developer tools Alt + F11
Add Microsoft VBScript Regular Expressions 5.5 as in other answers
Create the following function either in workbook or in its own module:
Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
Dim regex As New VBScript_RegExp_55.RegExp
Dim strInput As String
strInput = myRange.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
REGPLACE = regex.Replace(strInput, outputPattern)
End Function
Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")
Here is a regex_subst() function. Examples:
=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo
Here is the simplified code (simpler for me, anyway). I couldn't figure out how to build a suitable output pattern using the above to work like my examples:
Function regex_subst( _
strInput As String _
, matchPattern As String _
, Optional ByVal replacePattern As String = "" _
) As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function
I don't want to have to enable a reference library as I need my scripts to be portable. The Dim foo As New VBScript_RegExp_55.RegExp line caused User Defined Type Not Defined errors, but I found a solution that worked for me.
Update RE comments w/ #chrisneilsen :
I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.
Source: https://peltiertech.com/Excel/EarlyLateBinding.html
What you'll want to do is put an example string in cell A1, then test your strPattern. Once that's working adjust then rng as desired.
Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
Dim regexp As Object
'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
Dim rng As Range, rcell As Range
Dim strInput As String, strPattern As String
Set regexp = CreateObject("vbscript.regexp")
Set rng = ActiveSheet.Range("A1:A1")
strPattern = "([a-z]{2})([0-9]{8})"
'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched
With regexp
.Global = False
.MultiLine = False
.ignoreCase = True
.Pattern = strPattern
End With
For Each rcell In rng.Cells
If strPattern <> "" Then
strInput = rcell.Value
If regexp.test(strInput) Then
MsgBox rcell & " Matched in Cell " & rcell.Address
Else
MsgBox "No Matches!"
End If
End If
Next
End Sub
How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?
In-cell function to return a matched pattern or replaced value in a string.
Sub to loop through a column of data and extract matches to adjacent cells.
What setup is necessary?
What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.
Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
Select "Developer" tab (I don't have this tab what do I do?)
Select "Visual Basic" icon from 'Code' ribbon section
In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
Select "References"
Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
Click "OK"
Step 2: Define your pattern
Basic definitions:
- Range.
E.g. a-z matches an lower case letters from a to z
E.g. 0-5 matches any number from 0 to 5
[] Match exactly one of the objects inside these brackets.
E.g. [a] matches the letter a
E.g. [abc] matches a single letter which can be a, b or c
E.g. [a-z] matches any single lower case letter of the alphabet.
() Groups different matches for return purposes. See examples below.
{} Multiplier for repeated copies of pattern defined before it.
E.g. [a]{2} matches two consecutive lower case letter a: aa
E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa
+ Match at least one, or more, of the pattern defined before it.
E.g. a+ will match consecutive a's a, aa, aaa, and so on
? Match zero or one of the pattern defined before it.
E.g. Pattern may or may not be present but can only be matched one time.
E.g. [a-z]? matches empty string or any single lower case letter.
* Match zero or more of the pattern defined before it.
E.g. Wildcard for pattern that may or may not be present.
E.g. [a-z]* matches empty string or string of lower case letters.
. Matches any character except newline \n
E.g. a. Matches a two character string starting with a and ending with anything except \n
| OR operator
E.g. a|b means either a or b can be matched.
E.g. red|white|orange matches exactly one of the colors.
^ NOT operator
E.g. [^0-9] character can not contain a number
E.g. [^aA] character can not be lower case a or upper case A
\ Escapes special character that follows (overrides above behavior)
E.g. \., \\, \(, \?, \$, \^
Anchoring Patterns:
^ Match must occur at start of string
E.g. ^a First character must be lower case letter a
E.g. ^[0-9] First character must be a number.
$ Match must occur at end of string
E.g. a$ Last character must be lower case letter a
Precedence table:
Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |
Predefined Character Abbreviations:
abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line
Example 1: Run as macro
The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".
Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Example 4: Splitting apart different patterns
This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Results:
Additional Pattern Examples
String Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
</i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit
To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.
How it works
It takes 2-3 parameters.
A text to use the regular expression on.
A regular expression.
A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.
Some examples
Extracting an email address:
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+")
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+", "$0")
Results in: some#email.com
Extracting several substrings:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")
Results in: E-Mail: some#email.com, Name: Peter Gordon
To take apart a combined string in a single cell into its components in multiple cells:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)
Results in: Peter Gordon some#email.com ...
How to use
To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):
In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
Click on Tools -> References (please excuse the german screenshot)
Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
Click OK.
Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).
In the big text window in the middle insert the following:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
Save and close the Microsoft Visual Basic for Applications Editor window.
Expanding on patszim's answer for those in a rush.
Open Excel workbook.
Alt+F11 to open VBA/Macros window.
Add reference to regex under Tools then References
and selecting Microsoft VBScript Regular Expression 5.5
Insert a new module (code needs to reside in the module otherwise it doesn't work).
In the newly inserted module,
add the following code:
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = regexPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
The regex pattern is placed in one of the cells and absolute referencing is used on it.
Function will be tied to workbook that its created in.
If there's a need for it to be used in different workbooks, store the function in Personal.XLSB
Here is my attempt:
Function RegParse(ByVal pattern As String, ByVal html As String)
Dim regex As RegExp
Set regex = New RegExp
With regex
.IgnoreCase = True 'ignoring cases while regex engine performs the search.
.pattern = pattern 'declaring regex pattern.
.Global = False 'restricting regex to find only first match.
If .Test(html) Then 'Testing if the pattern matches or not
mStr = .Execute(html)(0) '.Execute(html)(0) will provide the String which matches with Regex
RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
Else
RegParse = "#N/A"
End If
End With
End Function
This isn't a direct answer but may provide a more efficient alternative for your consideration. Which is that Google Sheets has several built in Regex Functions these can be very convenient and help circumvent some of the technical procedures in Excel. Obviously there are some advantages to using Excel on your PC but for the large majority of users Google Sheets will offer an identical experience and may offer some benefits in portability and sharing of documents.
They offer
REGEXEXTRACT: Extracts matching substrings according to a regular expression.
REGEXREPLACE: Replaces part of a text string with a different text string using regular expressions.
SUBSTITUTE: Replaces existing text with new text in a string.
REPLACE: Replaces part of a text string with a different text string.
You can type these directly into a cell like so and will produce whatever you'd like
=REGEXMATCH(A2, "[0-9]+")
They also work quite well in combinations with other functions such as IF statements like so:
=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")
Hopefully this provides a simple workaround for those users who feel daunted by the VBS component of Excel.
To add to the valuable content, I would like to create this reminder on why sometimes RegEx within VBA is not ideal. Not all expressions are supported, but instead may throw an Error 5017 and may leave the author guessing (which I am a victim of myself).
Whilst we can find some sources on what is supported, it would be helpfull to know which metacharacters etc. are not supported. A more in-depth explaination can be found here. Mentioned in this source:
"Although "VBScript’s regular expression ... version 5.5 implements quite a few essential regex features that were missing in previous versions of VBScript. ... JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:"
So, not supported are:
Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string
End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string
Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported)
Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported)
Atomic Grouping
Possessive Quantifiers
Unicode e.g.: \{uFFFF}
Named Capturing Groups. Alternatively use Numbered Capturing Groups
Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available.
Conditionals
Regular Expression Comments. Add these with regular ' comments in script
I already hit a wall more than once using regular expressions within VBA. Usually with LookBehind but sometimes I even forget the modifiers. I have not experienced all these above mentioned backdrops myself but thought I would try to be extensive referring to some more in-depth information. Feel free to comment/correct/add. Big shout out to regular-expressions.info for a wealth of information.
P.S. You have mentioned regular VBA methods and functions, and I can confirm they (at least to myself) have been helpful in their own ways where RegEx would fail.
I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:
Make sure you are in a Macro Enabled Excel File (save as xlsm).
Open developer tools Alt + F11
Add Microsoft VBScript Regular Expressions 5.5 as in other answers
Create the following function either in workbook or in its own module:
Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
Dim regex As New VBScript_RegExp_55.RegExp
Dim strInput As String
strInput = myRange.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
REGPLACE = regex.Replace(strInput, outputPattern)
End Function
Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")
Here is a regex_subst() function. Examples:
=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo
Here is the simplified code (simpler for me, anyway). I couldn't figure out how to build a suitable output pattern using the above to work like my examples:
Function regex_subst( _
strInput As String _
, matchPattern As String _
, Optional ByVal replacePattern As String = "" _
) As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function
I don't want to have to enable a reference library as I need my scripts to be portable. The Dim foo As New VBScript_RegExp_55.RegExp line caused User Defined Type Not Defined errors, but I found a solution that worked for me.
Update RE comments w/ #chrisneilsen :
I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.
Source: https://peltiertech.com/Excel/EarlyLateBinding.html
What you'll want to do is put an example string in cell A1, then test your strPattern. Once that's working adjust then rng as desired.
Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
Dim regexp As Object
'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
Dim rng As Range, rcell As Range
Dim strInput As String, strPattern As String
Set regexp = CreateObject("vbscript.regexp")
Set rng = ActiveSheet.Range("A1:A1")
strPattern = "([a-z]{2})([0-9]{8})"
'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched
With regexp
.Global = False
.MultiLine = False
.ignoreCase = True
.Pattern = strPattern
End With
For Each rcell In rng.Cells
If strPattern <> "" Then
strInput = rcell.Value
If regexp.test(strInput) Then
MsgBox rcell & " Matched in Cell " & rcell.Address
Else
MsgBox "No Matches!"
End If
End If
Next
End Sub
How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?
In-cell function to return a matched pattern or replaced value in a string.
Sub to loop through a column of data and extract matches to adjacent cells.
What setup is necessary?
What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.
Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
Select "Developer" tab (I don't have this tab what do I do?)
Select "Visual Basic" icon from 'Code' ribbon section
In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
Select "References"
Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
Click "OK"
Step 2: Define your pattern
Basic definitions:
- Range.
E.g. a-z matches an lower case letters from a to z
E.g. 0-5 matches any number from 0 to 5
[] Match exactly one of the objects inside these brackets.
E.g. [a] matches the letter a
E.g. [abc] matches a single letter which can be a, b or c
E.g. [a-z] matches any single lower case letter of the alphabet.
() Groups different matches for return purposes. See examples below.
{} Multiplier for repeated copies of pattern defined before it.
E.g. [a]{2} matches two consecutive lower case letter a: aa
E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa
+ Match at least one, or more, of the pattern defined before it.
E.g. a+ will match consecutive a's a, aa, aaa, and so on
? Match zero or one of the pattern defined before it.
E.g. Pattern may or may not be present but can only be matched one time.
E.g. [a-z]? matches empty string or any single lower case letter.
* Match zero or more of the pattern defined before it.
E.g. Wildcard for pattern that may or may not be present.
E.g. [a-z]* matches empty string or string of lower case letters.
. Matches any character except newline \n
E.g. a. Matches a two character string starting with a and ending with anything except \n
| OR operator
E.g. a|b means either a or b can be matched.
E.g. red|white|orange matches exactly one of the colors.
^ NOT operator
E.g. [^0-9] character can not contain a number
E.g. [^aA] character can not be lower case a or upper case A
\ Escapes special character that follows (overrides above behavior)
E.g. \., \\, \(, \?, \$, \^
Anchoring Patterns:
^ Match must occur at start of string
E.g. ^a First character must be lower case letter a
E.g. ^[0-9] First character must be a number.
$ Match must occur at end of string
E.g. a$ Last character must be lower case letter a
Precedence table:
Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |
Predefined Character Abbreviations:
abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line
Example 1: Run as macro
The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".
Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Example 4: Splitting apart different patterns
This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Results:
Additional Pattern Examples
String Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
</i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit
To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.
How it works
It takes 2-3 parameters.
A text to use the regular expression on.
A regular expression.
A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.
Some examples
Extracting an email address:
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+")
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+", "$0")
Results in: some#email.com
Extracting several substrings:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")
Results in: E-Mail: some#email.com, Name: Peter Gordon
To take apart a combined string in a single cell into its components in multiple cells:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)
Results in: Peter Gordon some#email.com ...
How to use
To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):
In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
Click on Tools -> References (please excuse the german screenshot)
Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
Click OK.
Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).
In the big text window in the middle insert the following:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
Save and close the Microsoft Visual Basic for Applications Editor window.
Expanding on patszim's answer for those in a rush.
Open Excel workbook.
Alt+F11 to open VBA/Macros window.
Add reference to regex under Tools then References
and selecting Microsoft VBScript Regular Expression 5.5
Insert a new module (code needs to reside in the module otherwise it doesn't work).
In the newly inserted module,
add the following code:
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = regexPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
The regex pattern is placed in one of the cells and absolute referencing is used on it.
Function will be tied to workbook that its created in.
If there's a need for it to be used in different workbooks, store the function in Personal.XLSB
Here is my attempt:
Function RegParse(ByVal pattern As String, ByVal html As String)
Dim regex As RegExp
Set regex = New RegExp
With regex
.IgnoreCase = True 'ignoring cases while regex engine performs the search.
.pattern = pattern 'declaring regex pattern.
.Global = False 'restricting regex to find only first match.
If .Test(html) Then 'Testing if the pattern matches or not
mStr = .Execute(html)(0) '.Execute(html)(0) will provide the String which matches with Regex
RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
Else
RegParse = "#N/A"
End If
End With
End Function
This isn't a direct answer but may provide a more efficient alternative for your consideration. Which is that Google Sheets has several built in Regex Functions these can be very convenient and help circumvent some of the technical procedures in Excel. Obviously there are some advantages to using Excel on your PC but for the large majority of users Google Sheets will offer an identical experience and may offer some benefits in portability and sharing of documents.
They offer
REGEXEXTRACT: Extracts matching substrings according to a regular expression.
REGEXREPLACE: Replaces part of a text string with a different text string using regular expressions.
SUBSTITUTE: Replaces existing text with new text in a string.
REPLACE: Replaces part of a text string with a different text string.
You can type these directly into a cell like so and will produce whatever you'd like
=REGEXMATCH(A2, "[0-9]+")
They also work quite well in combinations with other functions such as IF statements like so:
=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")
Hopefully this provides a simple workaround for those users who feel daunted by the VBS component of Excel.
To add to the valuable content, I would like to create this reminder on why sometimes RegEx within VBA is not ideal. Not all expressions are supported, but instead may throw an Error 5017 and may leave the author guessing (which I am a victim of myself).
Whilst we can find some sources on what is supported, it would be helpfull to know which metacharacters etc. are not supported. A more in-depth explaination can be found here. Mentioned in this source:
"Although "VBScript’s regular expression ... version 5.5 implements quite a few essential regex features that were missing in previous versions of VBScript. ... JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:"
So, not supported are:
Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string
End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string
Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported)
Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported)
Atomic Grouping
Possessive Quantifiers
Unicode e.g.: \{uFFFF}
Named Capturing Groups. Alternatively use Numbered Capturing Groups
Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available.
Conditionals
Regular Expression Comments. Add these with regular ' comments in script
I already hit a wall more than once using regular expressions within VBA. Usually with LookBehind but sometimes I even forget the modifiers. I have not experienced all these above mentioned backdrops myself but thought I would try to be extensive referring to some more in-depth information. Feel free to comment/correct/add. Big shout out to regular-expressions.info for a wealth of information.
P.S. You have mentioned regular VBA methods and functions, and I can confirm they (at least to myself) have been helpful in their own ways where RegEx would fail.
I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:
Make sure you are in a Macro Enabled Excel File (save as xlsm).
Open developer tools Alt + F11
Add Microsoft VBScript Regular Expressions 5.5 as in other answers
Create the following function either in workbook or in its own module:
Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
Dim regex As New VBScript_RegExp_55.RegExp
Dim strInput As String
strInput = myRange.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
REGPLACE = regex.Replace(strInput, outputPattern)
End Function
Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")
Here is a regex_subst() function. Examples:
=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo
Here is the simplified code (simpler for me, anyway). I couldn't figure out how to build a suitable output pattern using the above to work like my examples:
Function regex_subst( _
strInput As String _
, matchPattern As String _
, Optional ByVal replacePattern As String = "" _
) As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function
I don't want to have to enable a reference library as I need my scripts to be portable. The Dim foo As New VBScript_RegExp_55.RegExp line caused User Defined Type Not Defined errors, but I found a solution that worked for me.
Update RE comments w/ #chrisneilsen :
I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.
Source: https://peltiertech.com/Excel/EarlyLateBinding.html
What you'll want to do is put an example string in cell A1, then test your strPattern. Once that's working adjust then rng as desired.
Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
Dim regexp As Object
'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
Dim rng As Range, rcell As Range
Dim strInput As String, strPattern As String
Set regexp = CreateObject("vbscript.regexp")
Set rng = ActiveSheet.Range("A1:A1")
strPattern = "([a-z]{2})([0-9]{8})"
'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched
With regexp
.Global = False
.MultiLine = False
.ignoreCase = True
.Pattern = strPattern
End With
For Each rcell In rng.Cells
If strPattern <> "" Then
strInput = rcell.Value
If regexp.test(strInput) Then
MsgBox rcell & " Matched in Cell " & rcell.Address
Else
MsgBox "No Matches!"
End If
End If
Next
End Sub
How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?
In-cell function to return a matched pattern or replaced value in a string.
Sub to loop through a column of data and extract matches to adjacent cells.
What setup is necessary?
What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.
Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:
Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"
Select "Developer" tab (I don't have this tab what do I do?)
Select "Visual Basic" icon from 'Code' ribbon section
In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
Select "References"
Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
Click "OK"
Step 2: Define your pattern
Basic definitions:
- Range.
E.g. a-z matches an lower case letters from a to z
E.g. 0-5 matches any number from 0 to 5
[] Match exactly one of the objects inside these brackets.
E.g. [a] matches the letter a
E.g. [abc] matches a single letter which can be a, b or c
E.g. [a-z] matches any single lower case letter of the alphabet.
() Groups different matches for return purposes. See examples below.
{} Multiplier for repeated copies of pattern defined before it.
E.g. [a]{2} matches two consecutive lower case letter a: aa
E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa
+ Match at least one, or more, of the pattern defined before it.
E.g. a+ will match consecutive a's a, aa, aaa, and so on
? Match zero or one of the pattern defined before it.
E.g. Pattern may or may not be present but can only be matched one time.
E.g. [a-z]? matches empty string or any single lower case letter.
* Match zero or more of the pattern defined before it.
E.g. Wildcard for pattern that may or may not be present.
E.g. [a-z]* matches empty string or string of lower case letters.
. Matches any character except newline \n
E.g. a. Matches a two character string starting with a and ending with anything except \n
| OR operator
E.g. a|b means either a or b can be matched.
E.g. red|white|orange matches exactly one of the colors.
^ NOT operator
E.g. [^0-9] character can not contain a number
E.g. [^aA] character can not be lower case a or upper case A
\ Escapes special character that follows (overrides above behavior)
E.g. \., \\, \(, \?, \$, \^
Anchoring Patterns:
^ Match must occur at start of string
E.g. ^a First character must be lower case letter a
E.g. ^[0-9] First character must be a number.
$ Match must occur at end of string
E.g. a$ Last character must be lower case letter a
Precedence table:
Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |
Predefined Character Abbreviations:
abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line
Example 1: Run as macro
The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".
Example 3: Loop Through Range
This example is the same as example 1 but loops through a range of cells.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Example 4: Splitting apart different patterns
This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Results:
Additional Pattern Examples
String Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
</i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit
To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.
How it works
It takes 2-3 parameters.
A text to use the regular expression on.
A regular expression.
A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.
Some examples
Extracting an email address:
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+")
=regex("Peter Gordon: some#email.com, 47", "\w+#\w+\.\w+", "$0")
Results in: some#email.com
Extracting several substrings:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")
Results in: E-Mail: some#email.com, Name: Peter Gordon
To take apart a combined string in a single cell into its components in multiple cells:
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some#email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)
Results in: Peter Gordon some#email.com ...
How to use
To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):
In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
Click on Tools -> References (please excuse the german screenshot)
Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
Click OK.
Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).
In the big text window in the middle insert the following:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
Save and close the Microsoft Visual Basic for Applications Editor window.
Expanding on patszim's answer for those in a rush.
Open Excel workbook.
Alt+F11 to open VBA/Macros window.
Add reference to regex under Tools then References
and selecting Microsoft VBScript Regular Expression 5.5
Insert a new module (code needs to reside in the module otherwise it doesn't work).
In the newly inserted module,
add the following code:
Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = regexPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function
The regex pattern is placed in one of the cells and absolute referencing is used on it.
Function will be tied to workbook that its created in.
If there's a need for it to be used in different workbooks, store the function in Personal.XLSB
Here is my attempt:
Function RegParse(ByVal pattern As String, ByVal html As String)
Dim regex As RegExp
Set regex = New RegExp
With regex
.IgnoreCase = True 'ignoring cases while regex engine performs the search.
.pattern = pattern 'declaring regex pattern.
.Global = False 'restricting regex to find only first match.
If .Test(html) Then 'Testing if the pattern matches or not
mStr = .Execute(html)(0) '.Execute(html)(0) will provide the String which matches with Regex
RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
Else
RegParse = "#N/A"
End If
End With
End Function
This isn't a direct answer but may provide a more efficient alternative for your consideration. Which is that Google Sheets has several built in Regex Functions these can be very convenient and help circumvent some of the technical procedures in Excel. Obviously there are some advantages to using Excel on your PC but for the large majority of users Google Sheets will offer an identical experience and may offer some benefits in portability and sharing of documents.
They offer
REGEXEXTRACT: Extracts matching substrings according to a regular expression.
REGEXREPLACE: Replaces part of a text string with a different text string using regular expressions.
SUBSTITUTE: Replaces existing text with new text in a string.
REPLACE: Replaces part of a text string with a different text string.
You can type these directly into a cell like so and will produce whatever you'd like
=REGEXMATCH(A2, "[0-9]+")
They also work quite well in combinations with other functions such as IF statements like so:
=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")
Hopefully this provides a simple workaround for those users who feel daunted by the VBS component of Excel.
To add to the valuable content, I would like to create this reminder on why sometimes RegEx within VBA is not ideal. Not all expressions are supported, but instead may throw an Error 5017 and may leave the author guessing (which I am a victim of myself).
Whilst we can find some sources on what is supported, it would be helpfull to know which metacharacters etc. are not supported. A more in-depth explaination can be found here. Mentioned in this source:
"Although "VBScript’s regular expression ... version 5.5 implements quite a few essential regex features that were missing in previous versions of VBScript. ... JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:"
So, not supported are:
Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string
End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string
Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported)
Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported)
Atomic Grouping
Possessive Quantifiers
Unicode e.g.: \{uFFFF}
Named Capturing Groups. Alternatively use Numbered Capturing Groups
Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available.
Conditionals
Regular Expression Comments. Add these with regular ' comments in script
I already hit a wall more than once using regular expressions within VBA. Usually with LookBehind but sometimes I even forget the modifiers. I have not experienced all these above mentioned backdrops myself but thought I would try to be extensive referring to some more in-depth information. Feel free to comment/correct/add. Big shout out to regular-expressions.info for a wealth of information.
P.S. You have mentioned regular VBA methods and functions, and I can confirm they (at least to myself) have been helpful in their own ways where RegEx would fail.
I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:
Make sure you are in a Macro Enabled Excel File (save as xlsm).
Open developer tools Alt + F11
Add Microsoft VBScript Regular Expressions 5.5 as in other answers
Create the following function either in workbook or in its own module:
Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
Dim regex As New VBScript_RegExp_55.RegExp
Dim strInput As String
strInput = myRange.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
REGPLACE = regex.Replace(strInput, outputPattern)
End Function
Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")
Here is a regex_subst() function. Examples:
=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo
Here is the simplified code (simpler for me, anyway). I couldn't figure out how to build a suitable output pattern using the above to work like my examples:
Function regex_subst( _
strInput As String _
, matchPattern As String _
, Optional ByVal replacePattern As String = "" _
) As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function
I don't want to have to enable a reference library as I need my scripts to be portable. The Dim foo As New VBScript_RegExp_55.RegExp line caused User Defined Type Not Defined errors, but I found a solution that worked for me.
Update RE comments w/ #chrisneilsen :
I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.
Source: https://peltiertech.com/Excel/EarlyLateBinding.html
What you'll want to do is put an example string in cell A1, then test your strPattern. Once that's working adjust then rng as desired.
Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
Dim regexp As Object
'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
Dim rng As Range, rcell As Range
Dim strInput As String, strPattern As String
Set regexp = CreateObject("vbscript.regexp")
Set rng = ActiveSheet.Range("A1:A1")
strPattern = "([a-z]{2})([0-9]{8})"
'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched
With regexp
.Global = False
.MultiLine = False
.ignoreCase = True
.Pattern = strPattern
End With
For Each rcell In rng.Cells
If strPattern <> "" Then
strInput = rcell.Value
If regexp.test(strInput) Then
MsgBox rcell & " Matched in Cell " & rcell.Address
Else
MsgBox "No Matches!"
End If
End If
Next
End Sub
has VBA any good mechanism for checking, if the content of a given Excel Cell matches a specific regex?
In my case i want to know, if some cell has the format
m
m2
m1234
In fact, there's just one defined letter at the beginning, followed by a not specified amount of numbers.
How do I put this into an If-Else construct?
If Doc.Cells(1,1).Value ..... ???
greets, poeschlorn
You can get at the VBScript RegExp objects via Tools->References & adding "Microsoft VBScript Regular Expressions 5.5"
Alternatively a quick way to do it, if you don't need to check for a subsequent letter as in `m1234X1 is:
if Doc.Cells(1,1).Value like "[a-zA-Z]#*" then ...
(This doesn't require a reference to anything)
I don't know VBA, but the regex [a-zA-Z][0-9]* might be able to match what you want.
Here is my RegexContains function. Pass it the cell and the pattern and it will return TRUE or FALSE if it contains it or not.
Function RegexContains(ByVal find_in As String, _
ByVal find_what As String, _
Optional IgnoreCase As Boolean = False) As Boolean
Application.ScreenUpdating = False
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = find_what
RE.IgnoreCase = IgnoreCase
RE.Global = True
RegexContains = RE.Test(find_in)
Application.ScreenUpdating = True
End Function
Now, I'm not sure exactly what you want to find in your example, but if you want to know if the cell contains a single letter followed by one or more letters, then you would use (assuming the cell is A1): =RegexContains(A1, "^\w\d+")
The ^ marks the start of the sentence
The \w marks a single alphabetic
character (a-zA-Z)
The \d+ marks one or more numeric
characters[0-9]
I hope this helps.