VBA RegEx identifiying multiple patterns - Excel - regex

I´m rather new to VBA RegEx, but thanks to this stackoverflow thread,
I am getting to it. I have a problem and hope that somebody can help. In row 1 in Excel I have multiple Strings with different city/country attribution. Example:
A1: "/flights/munich/newyork"
A2: "flights/munich/usa"
A3: "flights/usa/germany"
...
What I wanna have now, is a VBA that goes though those strings with RegEx and prompts a categorisation value if the RegEx is met. Example:
A1: "/flights/munich/new-york" categorises as "city to city"
A2: "flights/munich/usa" categorises as "city to country"
A3: "flights/usa/germany" categorises as "country to country"
Right now, I have a code that will return the "city to city" category to me, but I can´t figure out who to get a code that handles the multiple patterns and returns the corresponding output string.
In short, a logic like this is needed:
If A1 contains RegEx ".*/munich/new-york" then return output string "city to city", if A1 contains RegEx ".*/munich/usa" then return output string "city to country" and so on.
Guess this has something to to with how to handle multiple if statements with multiple patterns in VBA, but I can´t figure it out.
This is how my code looks right now - hope you can help!
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "(munich|berlin|new-york|porto|copenhagen|moscow)/(munich|berlin|new-york|porto|copenhagen|moscow)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = "CITY TO CITY"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "NO MATCH FOUND"
End If
End If
End Function

Like #dbmitch mentions in the comments, you can't do this with a single Regex - you'll need to use 3 of them. I'd personally put the cities and countries into Consts and build the patterns as need. You can then pass them (along with the strReplace) as parameters to simpleCellRegex function:
Const CITIES As String = "(munich|berlin|new-york|porto|copenhagen|moscow)"
Const COUNTRIES As String = "(germany|france|usa|russia|etc)"
Function simpleCellRegex(Myrange As Range, strReplace As String, strPattern As String) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "NO MATCH FOUND"
End If
End If
End Function
Called like this:
foo = simpleCellRegex(someRange, "CITY TO CITY", CITIES & "/" & CITIES)
foo = simpleCellRegex(someRange, "CITY TO COUNTRY", CITIES & "/" & COUNTRIES)
foo = simpleCellRegex(someRange, "COUNTRY TO COUNTRY", COUNTRIES & "/" & COUNTRIES)
Note: If you're doing this in a loop, it would be wildly more efficient to only build each RegExp once, and then pass that as a parameter instead of the pattern.

A little (maybe) "out of the box" solution:
Option Explicit
Sub main()
Const CITIES As String = "MUNICH|BERLIN|NEWYORK|PORTO|COPENHAGEN|MOSCOW"
Const COUNTRIES As String = "USA|GERMANY"
With Worksheets("FLIGHTS")
With .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
With .Offset(, 1)
.value = .Offset(, -1).value
.Replace What:="*flights/", replacement:="", LookAt:=xlPart, MatchCase:=False
.Replace What:="/", replacement:=" to ", LookAt:=xlPart, MatchCase:=False
ReplaceElement .Cells, CITIES, "city"
ReplaceElement .Cells, COUNTRIES, "country"
End With
End With
End With
End Sub
Sub ReplaceElement(rng As Range, whats As String, replacement As String)
Dim elem As Variant
With rng
For Each elem In Split(whats, "|")
.Replace What:=elem, replacement:=replacement, LookAt:=xlPart, MatchCase:=False
Next elem
End With
End Sub
note
replace() methods can be taught to ignore cases but beware to have consistency between names: "newyork" will never match "new-york"

I would do this a bit differently.
I would make the regex pattern the start or end point, and match it against a comma delimited string of cities or countries.
Given what you have presented, the start and end points will always be the last two / separated units.
So something like:
Option Explicit
Sub CategorizeFlights()
Dim rData As Range, vData As Variant
Dim arrCity() As Variant
Dim arrCountry() As Variant
Dim I As Long, J As Long
Dim sCategoryStart As String, sCategoryEnd As String
Dim V As Variant
Dim RE As RegExp
arrCity = Array("munich", "newyork")
arrCountry = Array("usa", "germany")
Set RE = New RegExp
With RE
.Global = False
.ignorecase = True
End With
Set rData = Range("A2", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=2)
vData = rData
For I = 1 To UBound(vData, 1)
V = Split(vData(I, 1), "/")
RE.Pattern = "\b" & V(UBound(V) - 1) & "\b"
If RE.test(Join(arrCity, ",")) = True Then
sCategoryStart = "City to "
ElseIf RE.test(Join(arrCountry, ",")) = True Then
sCategoryStart = "Country to "
Else
sCategoryStart = "Unknown to "
End If
RE.Pattern = "\b" & V(UBound(V)) & "\b"
If RE.test(Join(arrCity, ",")) = True Then
sCategoryEnd = "City"
ElseIf RE.test(Join(arrCountry, ",")) = True Then
sCategoryEnd = "Country"
Else
sCategoryEnd = "Unknown"
End If
vData(I, 2) = sCategoryStart & sCategoryEnd
Next I
With rData
.Value = vData
.EntireColumn.AutoFit
End With
End Sub
As is sometimes the case, a similar algorithm can be used without regular expressions, but I assume this is an exercise in its use.

Related

Regular Expression in excel VBA

I'm trying to use regex in excel VBA to match a pattern within all cells in a column range, and remove the matched patterns to a new column range.
E.g.
Happy Day Care Club (1124734)
French Pattiserie (8985D)
The King's Pantry (G6666642742D)
Big Shoe (China) Ltd (ZZ454)
Essentially I want to remove the last bracketed portion of each string and transpose this part (without the brackets) into a different column range.
The regex I have so far is "(([^)]+))\z" (which I don't know if this is actually correct), and embedded within this VBA:
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Sheets("Sheet 1").Activate
Range("FF65536").End(xlUp).Select
LastCell = ActiveCell.Address
Set Myrange = ActiveSheet.Range("FF2:" & LastCell)
For Each C In Myrange
strPattern = "(\(([^\)]+)\)\z)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Range("FF2").Select = regEx.Replace(strInput, "$1")
Range("DX2").Select = regEx.Replace(strInput, "$2")
End If
End If
Next
I'm a newbie so please forgive glaringly obvious mistakes.
Many thanks,
No your regex pattern isn't correct. You should test your pattern separately as regex is its own mini-language. Try this pattern (Regex101):
\((.+)\)$
About the gm options: g means Global, m means Multiline, both of which are set to True in your code.
Here's a non-RegEx method:
Dim Myrange As Range
Sheets("Sheet 1").Activate
Set Myrange = ActiveSheet.Range("FF2:FF" & Cells(Rows.Count, "FF").End(xlUp).Row)
With Myrange
.Offset(, -43).Value = .Worksheet.Evaluate("INDEX(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(" & .Address & _
",""("",REPT("" "",500)),500)),"")"",""""),)")
End With
Personally I would resort to RegEx as a last resort...
Here is a snippet using string functions:
Dim iRow As Long
Dim s As String
For iRow = 1 To UsedRange.Rows.Count
Debug.Print Cells(iRow, 1).Value
s = Cells(iRow, 1).Value
s = Trim(Left(s, InStrRev(s, "(") - 1))
Debug.Print s
Next
The relevant line being Trim(Left(s, InStrRev(s, "(") - 1)). You would need QA check to deal with data w/o proper format.

Combine RegEx and fill an underminate number of cells

I have a cell in Excel that holds a long string in cell A1:
"ABC12+BED58,YZ001"
I have the following regex to match some specific variables in my string
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Basically, I need to write a macro or a function (I would prefer a function actually) that will fill cell A2, A3, A4 like that:
ABC12
BED58
YZ001
The thing is, there is an undeterminate number of parameters in the string (so for example, it could go all the way through A200).
I'm thinking of a function get_n_variables(str, n) that would return the Nth unique match
Here is my progress so far but the function returns #VALUE!
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As Object
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(0).SubMatches(0)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
From MrExcel Forum:
You can not put a function in a cell to change other cells. Functions do not work this way.
Thus, it should be a sub, like this, e.g. (outputs the matches under the selected cell with our input string):
Sub simpleCellRegex()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As MatchCollection
Dim i As Long, cnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
cnt = 1
If strPattern <> "" Then
strInput = ActiveCell.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set objMatches = regEx.Execute(strInput)
For i = 0 To objMatches.Count - 1
ActiveCell.Offset(cnt).Value = objMatches.Item(i)
cnt = cnt + 1
Next
End If
End If
End Sub
Output:
You can actually still use an function if you use an array
select B1:D1
enter this formula =simpleCellRegex(A1) and press CTRL+SHIFT+ENTER
if you dont know how many matches enter in more cells than there may be matches
code
Function simpleCellRegex(StrIn As String) As Variant
Dim regEx As Object
Dim regMC As Object
Dim X
Dim strPattern As String
Dim lngCnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Set regEx = CreateObject("vbscript.regexp")
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
If .Test(StrIn) Then
Set regMC = .Execute(StrIn)
ReDim X(0 To regMC.Count - 1) As String
For lngCnt = 0 To UBound(X)
X(lngCnt) = regMC(lngCnt)
Next
simpleCellRegex = X
Else
simpleCellRegex = "Not matched"
End If
End With
End Function

RegEx to extract email

I need to extract only the email from a spreadsheet in Excel. I've found some example VBA code here at this StackOverflow link, courtesy of Portland Runner.
I created an Excel module and it seems to be working fine, except it only returns the first uppercase character of the address into the cell and it's ignoring the email.
For example:
Text | Result
----------------------------------------|------------------------------
My email address is address#gmail.com | My email address is
Yes Address#gmail.com | Yes A
Below is the code I'm using:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
I do not have enough experience with VBA to really diagnose what might be happening here, hopefully someone will be able to spot what I'm doing wrong.
Working Code
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-zA-Z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(0).Value
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
When You return strInput You just get the same string as the input.
You need to return Value that has been found using RegExp.
Try
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(1).Value
Instead of
simpleCellRegex = regEx.Replace(strInput, strReplace)
You can change the line
simpleCellRegex = regEx.Replace(strInput, strReplace)
To
simpleCellRegex = strInput
Because you are not making any replacement
The easiest way to do this is by installing the software called KUtool. After installing, highlight the content you want to extract emails==>Click ku tools at the top middle==>click on text==>extract emails.
You can also use the following code.(ALT+F1==>INSERT MODULE)
Function ExtractEmailFun(extractStr As String) As String
'Update 20130829
Dim CharList As String
On Error Resume Next
CheckStr = "[A-Za-z0-9._-]"
OutStr = ""
Index = 1
Do While True
Index1 = VBA.InStr(Index, extractStr, "#")
getStr = ""
If Index1 > 0 Then
For p = Index1 - 1 To 1 Step -1
If Mid(extractStr, p, 1) Like CheckStr Then
getStr = Mid(extractStr, p, 1) & getStr
Else
Exit For
End If
Next
getStr = getStr & "#"
For p = Index1 + 1 To Len(extractStr)
If Mid(extractStr, p, 1) Like CheckStr Then
getStr = getStr & Mid(extractStr, p, 1)
Else
Exit For
End If
Next
Index = Index1 + 1
If OutStr = "" Then
OutStr = getStr
Else
OutStr = OutStr & Chr(10) & getStr
End If
Else
Exit Do
End Ifenter code here
Loop
ExtractEmailFun = OutStr
End Function
You can also go the code way
Open excell, click on ALT +F1, Click on insert Module and paste this code
Click save and enter the formula(Column=ExtractEmailFun(A1)) in a blank cell. press enter and your emails will be extracted. Hope this will help
Try the below pattern
strPattern = "^([a-zA-Z0-9_\-\.]+)#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"

Excel Regular Expression: Add Quote (") to Values in Two Columns

So I have a CSV file with two columns that have items listed out like below:
The goal is to create a Excel VB code that will go through columns H and I, and add a quote (") to the beginning and end of each 6 digit group (e.g., H67100 into "H67100"). Additionally, the comma should be left alone.
I know the code is not complete as of yet, but this is what I have thus far. I think I am fine with the beginning part but after the match is found, I think my logic/syntax is incorrect. A little guidance and feedback is much appreciated:
Private Sub splitUpRegexPattern2()
Dim strPattern As String: strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("H:I")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
End Sub
UPDATED CODE:
Function splitUpRegexPattern2 (Myrange As Range) as String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Dim strReplace As String
Dim strOutput As String
strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End FUNCTION
Adding example CSV file. Download Sample CSV File
This answer assumes you can get the values of each cell you are interested in.
There's no need to use RegEx in this case as your values appear to be simple comma-delimited data.
Public Const DOUBLE_QUOTE As String = Chr(34)
'''
'''<summary>This function splits a string into an array on commas, adds quotes around each element in the array, the joins the array back into a string placing a comma between each element.</summary>
'''
Public Function QuotedValues(ByVal input As String) As String
Dim words As String() = input.Split(New Char() {","})
Dim result As String = String.Empty
words = (From w In words Select DOUBLE_QUOTE & w.Trim & DOUBLE_QUOTE).ToArray
result = String.Join(", ", words)
Return result
End Function

Excel RegEx Extraction

recently I've been trying to extract some strings from text in excel. I used script from other post here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Since Macro code is working fine I couldn't use in Cell function, it's showing #NAME? error. I've included "Microsoft VBScript Regular Expressions 5.5" but still no luck.
I can use it with macro but script needs some changes. I would like to have some strings in A1:A50, then to B1:B50 extract date in format DD Month YYYY (e.g. 28 July 2014) and to C1:C50 extract account no in format G1234567Y.
For now script is replacing date with "". Regular Expression for date is correct but how to insert date into B column? And then A/c no to C column working on 1:50 range?
This is the code:
Sub simpleRegex()
Dim strPattern As String: strPattern = "[0-9][0-9].*[0-9][0-9][0-9][0-9]"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Dim Out As Range
Set Myrange = ActiveSheet.Range("A1")
Set Out = ActiveSheet.Range("B1")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Out = regEx.Replace(strInput, strReplace)
Else
MsgBox ("Not matched")
End If
End If
End Sub
Thank You kindly for any assistance.
Currently your replacing the matching string with an empty string "" so that's why your getting no result. You need to return the actual match using () to indicate match set and $1 to retrieve it.
Based on your example, I'll assume your text in column A looks like this: 28 July 2014 G1234567Y
Here is a routine that will split apart the text into a date and then the text following the date.
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A50")
For Each C In Myrange
strPattern = "([0-9]{1,2}.*[0-9]{4}) (.*)"
'strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Result:
To use an in-cell function, set it up to extract a single piece such as Date or everything else. The following code will extract the date. Cell B1 would have the following equation: =extractDate(A1)
Function extractDate(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strRaplace As String
Dim strOutput As String
strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
extractDate = regEx.Replace(strInput, "$1")
Else
extractDate = "Not matched"
End If
End If
End Function
To make another function for extracting the rest of the date simply change $1 to $2 and it will return the second defined match in the pattern.