RegEx to extract email - regex

I need to extract only the email from a spreadsheet in Excel. I've found some example VBA code here at this StackOverflow link, courtesy of Portland Runner.
I created an Excel module and it seems to be working fine, except it only returns the first uppercase character of the address into the cell and it's ignoring the email.
For example:
Text | Result
----------------------------------------|------------------------------
My email address is address#gmail.com | My email address is
Yes Address#gmail.com | Yes A
Below is the code I'm using:
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
I do not have enough experience with VBA to really diagnose what might be happening here, hopefully someone will be able to spot what I'm doing wrong.
Working Code
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-zA-Z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(0).Value
Else
simpleCellRegex = "Not matched"
End If
End If
End Function

When You return strInput You just get the same string as the input.
You need to return Value that has been found using RegExp.
Try
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(1).Value
Instead of
simpleCellRegex = regEx.Replace(strInput, strReplace)

You can change the line
simpleCellRegex = regEx.Replace(strInput, strReplace)
To
simpleCellRegex = strInput
Because you are not making any replacement

The easiest way to do this is by installing the software called KUtool. After installing, highlight the content you want to extract emails==>Click ku tools at the top middle==>click on text==>extract emails.
You can also use the following code.(ALT+F1==>INSERT MODULE)
Function ExtractEmailFun(extractStr As String) As String
'Update 20130829
Dim CharList As String
On Error Resume Next
CheckStr = "[A-Za-z0-9._-]"
OutStr = ""
Index = 1
Do While True
Index1 = VBA.InStr(Index, extractStr, "#")
getStr = ""
If Index1 > 0 Then
For p = Index1 - 1 To 1 Step -1
If Mid(extractStr, p, 1) Like CheckStr Then
getStr = Mid(extractStr, p, 1) & getStr
Else
Exit For
End If
Next
getStr = getStr & "#"
For p = Index1 + 1 To Len(extractStr)
If Mid(extractStr, p, 1) Like CheckStr Then
getStr = getStr & Mid(extractStr, p, 1)
Else
Exit For
End If
Next
Index = Index1 + 1
If OutStr = "" Then
OutStr = getStr
Else
OutStr = OutStr & Chr(10) & getStr
End If
Else
Exit Do
End Ifenter code here
Loop
ExtractEmailFun = OutStr
End Function
You can also go the code way
Open excell, click on ALT +F1, Click on insert Module and paste this code
Click save and enter the formula(Column=ExtractEmailFun(A1)) in a blank cell. press enter and your emails will be extracted. Hope this will help

Try the below pattern
strPattern = "^([a-zA-Z0-9_\-\.]+)#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"

Related

RegExp in VBA: remove excess numbers if there is a text/special character that appears after a certain length of numbers

Currently my code removes all the text and special characters within the cell value and leaves only the numbers.
the only thing left to do is to remove the numbers after a text/special character appears after a certain length
Example.
412074442 (y) 2367
My code only outputs this as 0412 074 4422367 but it should be 0412 074 442
removing the excess numbers after the "(y)"
Dim strPattern As String: strPattern = "^4(\d\d)(\d\d\d)(\d\d\d)"
Dim strReplace As String: strReplace = "04$1 $2 $3"
Dim strPattern2 As String: strPattern2 = "[^0-9]"
Dim strReplace2 As String: strReplace2 = ""
Set Myrange = ActiveSheet.Range("A2:A8") '***change range to determined text
For Each cell In Myrange
If strPattern2 <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern2
End With
If regEx.test(strInput) Then
'MsgBox (regEx.Replace(strInput, strReplace))
cell.Value = regEx.Replace(strInput, strReplace2)
Else
'MsgBox ("Not matched")
End If
End If
Next
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
'MsgBox (regEx.Replace(strInput, strReplace))
cell.Value = regEx.Replace(strInput, strReplace)
Else
'MsgBox ("Not matched")
End If
End If
Next
For Each cell In Myrange
strInput = cell.Value
If Len([strInput]) <> 12 Then
'MsgBox "Error"
cell.Interior.ColorIndex = 3
Else
End If
If Len([strInput]) = 0 Then
'MsgBox "Error"
cell.Interior.ColorIndex = 0
Else
End If
Next
End With
Need help, Thanks.
You may change a bit the architecture of the solution. Thus, the following:
First look for the (y) and cut anything to the right of it.
Then use the RegEx to remove the non-numeric characters.
This is the code for the removal of the parts right from (y):
Option Explicit
Public Sub TestMe()
Dim location As Long
location = InStr(1, Range("A1"), "(y") - 1
If location > 0 Then
Range("a1") = Left(Range("a1"), location)
End If
End Sub
Left Function MSDN

Regex - how to test for 2 str patterns and make replacements based on which str pattern matches

I currently have two functioning separate subs in Excel VBA. Each sub searches for a different string pattern and then makes a replacement.
Sub 1 searches for a leading 0 in the target string, strips it out, and places the contents in a separate cell.
Sub 2 searches for terminal "99" in the target string, replacing the "99" with Xs, and places the contents in a separate cell.
The way I do this particular operation is to run Sub1 first. Results are placed in column AO. Then I run Sub2 against the results obtained from Sub1 and place those results in the next adjacent column.
I would like to combine the two subs and run just one time getting the desired results.
Here are examples of the target string in column W that I am applying the regex against:
098765-9876-77
333222-7777-G5
9876-078-99
9867x77A
Sub 1
Sub tom_briggs_test_leading_zero()
'This sub searches for a leading zero in the target string and removes it.
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("w2:w73352")
For Each cell In Myrange
strPattern = "^0(.*)"
If strPattern <> "" Then
strInput = cell.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
cell.Offset(0, 18) = regEx.Replace(strInput, strReplace)
Else
cell.Offset(0, 18) = strInput
End If
End If
Next
End Sub
Sub 2
Sub tom_briggs_test_trailing_99()
'This sub searchs for teriminal 99s in the target string and replaces them
'with -XX.
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("AO2:AO73352")
'AO is the column where results from Sub1 have been placed
For Each cell In Myrange
strPattern = "(.*)-99$"
If strPattern <> "" Then
strInput = cell.Value
strReplace = "$1-XX"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
cell.Offset(0, 1) = regEx.Replace(strInput, strReplace)
Else
cell.Offset(0, 1) = strInput
End If
End If
Next
End Sub
Thanks for your consideration.
How about this:
Sub tom_briggs_fix_head_and_tail()
'This sub removes a leading zero in the target string and
'replaces trailing 99s in the target string with -XX.
Dim regExHead As New RegExp
Dim strHeadPattern As String
Dim strHeadReplace As String
Dim regExTail As New RegExp
Dim strTailPattern As String
Dim strTailReplace As String
Dim strInput As String
Dim Myrange As Range
Dim c As Range
Set Myrange = ActiveSheet.Range("w2:w73352")
strHeadPattern = "^0(.*)"
strHeadReplace = "$1"
strTailPattern = "(.*)-99$"
strTailReplace = "$1-XX"
With regExHead
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strHeadPattern
End With
With regExTail
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strTailPattern
End With
For Each c In Myrange
strInput = c.Value
strInput = IIf(regExHead.Test(strInput), _
regExHead.Replace(strInput, strHeadReplace), strInput)
strInput = IIf(regExTail.Test(strInput), _
regExTail.Replace(strInput, strTailReplace), strInput)
c.Offset(0, 19) = strInput
Next
End Sub
Hope that helps
You don't need a regex for that. Just take a hint from the following code:
Sub test()
Set myRange = Sheet1.Range("A1:A2") 'Change this range as per your requirement
For Each cell In myRange
strInput = cell.Value
'Checking if the 1st number is 0 or not
If CInt(Mid(strInput, 1, 1)) = 0 Then
strInput = Mid(strInput, 2)
End If
'Checking if -99 is present in the end or not
If StrComp("-99", Right(strInput, 3), 1) = 0 Then
strInput = Left(strInput, Len(strInput) - 3) & "-XX"
End If
'If there was a leading 0 or a trailing 99, then only write the updated value in another cell
If StrComp(cell.Value, strInput, 1) <> 0 Then
cell.Offset(0, 1).Value = strInput
End If
Next
End Sub

Combine RegEx and fill an underminate number of cells

I have a cell in Excel that holds a long string in cell A1:
"ABC12+BED58,YZ001"
I have the following regex to match some specific variables in my string
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Basically, I need to write a macro or a function (I would prefer a function actually) that will fill cell A2, A3, A4 like that:
ABC12
BED58
YZ001
The thing is, there is an undeterminate number of parameters in the string (so for example, it could go all the way through A200).
I'm thinking of a function get_n_variables(str, n) that would return the Nth unique match
Here is my progress so far but the function returns #VALUE!
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As Object
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(0).SubMatches(0)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
From MrExcel Forum:
You can not put a function in a cell to change other cells. Functions do not work this way.
Thus, it should be a sub, like this, e.g. (outputs the matches under the selected cell with our input string):
Sub simpleCellRegex()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As MatchCollection
Dim i As Long, cnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
cnt = 1
If strPattern <> "" Then
strInput = ActiveCell.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set objMatches = regEx.Execute(strInput)
For i = 0 To objMatches.Count - 1
ActiveCell.Offset(cnt).Value = objMatches.Item(i)
cnt = cnt + 1
Next
End If
End If
End Sub
Output:
You can actually still use an function if you use an array
select B1:D1
enter this formula =simpleCellRegex(A1) and press CTRL+SHIFT+ENTER
if you dont know how many matches enter in more cells than there may be matches
code
Function simpleCellRegex(StrIn As String) As Variant
Dim regEx As Object
Dim regMC As Object
Dim X
Dim strPattern As String
Dim lngCnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Set regEx = CreateObject("vbscript.regexp")
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
If .Test(StrIn) Then
Set regMC = .Execute(StrIn)
ReDim X(0 To regMC.Count - 1) As String
For lngCnt = 0 To UBound(X)
X(lngCnt) = regMC(lngCnt)
Next
simpleCellRegex = X
Else
simpleCellRegex = "Not matched"
End If
End With
End Function

excel VB regexp 5.5 capturing group

I have a problem using regexp in excel macro, by calling regex.execute(string), instead of getting an array of returned capturing groups, I always get single return which is the whole string specified in the pattern.
By using the same pattern in http://www.regexr.com/, I can see the return nicely grouped. What am I missing from this:
Private Sub ParseFileName(strInput As String)
Dim regEx As New RegExp
Dim strPattern As String
Dim strReplace
'Sample string \\Work_DIR\FTP\Results\RevA\FTP_01_01_06_Results\4F\ACC2X2R33371_SASSSD_run1
strPattern = "FTP_(\w+)_Results\\(\w+)\\([\d,\D]+)_(SAS|SATA)(HDD|SSD)_run(\d)"
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set strReplace = regEx.Execute(strInput)
ActiveCell.Offset(0, 1) = strReplace.Count
Else
ActiveCell.Offset(0, 1) = "(Not matched)"
End If
End sub
In the end, strReplace.Count always shows 1, which is the whole string FTP_01_01_06_Results\4F\ACC2X8R133371_SASSSD_run1
Use .SubMatches to get capturing groups values:
Private Sub ParseFileName(strInput As String)
Dim regEx As New RegExp
Dim strPattern As String
Dim strReplace As MatchCollection
Dim i As Long
'Sample string \\Work_DIR\FTP\Results\RevA\FTP_01_01_06_Results\4F\ACC2X2R33371_SASSSD_run1
strPattern = "FTP_(\w+)_Results\\(\w+)\\([\d,\D]+)_(SAS|SATA)(HDD|SSD)_run(\d)"
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set strReplace = regEx.Execute(strInput)
ActiveCell.Offset(0, 1) = strReplace.Count
For i = 0 To 5
ActiveCell.Offset(i + 1, 1) = strReplace(0).SubMatches(i)
Next
Else
ActiveCell.Offset(0, 1) = "(Not matched)"
End If
End Sub

Excel RegEx Extraction

recently I've been trying to extract some strings from text in excel. I used script from other post here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Since Macro code is working fine I couldn't use in Cell function, it's showing #NAME? error. I've included "Microsoft VBScript Regular Expressions 5.5" but still no luck.
I can use it with macro but script needs some changes. I would like to have some strings in A1:A50, then to B1:B50 extract date in format DD Month YYYY (e.g. 28 July 2014) and to C1:C50 extract account no in format G1234567Y.
For now script is replacing date with "". Regular Expression for date is correct but how to insert date into B column? And then A/c no to C column working on 1:50 range?
This is the code:
Sub simpleRegex()
Dim strPattern As String: strPattern = "[0-9][0-9].*[0-9][0-9][0-9][0-9]"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Dim Out As Range
Set Myrange = ActiveSheet.Range("A1")
Set Out = ActiveSheet.Range("B1")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Out = regEx.Replace(strInput, strReplace)
Else
MsgBox ("Not matched")
End If
End If
End Sub
Thank You kindly for any assistance.
Currently your replacing the matching string with an empty string "" so that's why your getting no result. You need to return the actual match using () to indicate match set and $1 to retrieve it.
Based on your example, I'll assume your text in column A looks like this: 28 July 2014 G1234567Y
Here is a routine that will split apart the text into a date and then the text following the date.
Private Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A50")
For Each C In Myrange
strPattern = "([0-9]{1,2}.*[0-9]{4}) (.*)"
'strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = C.Value
strReplace = "$1"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput, "$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
Else
C.Offset(0, 1) = "(Not matched)"
End If
End If
Next
End Sub
Result:
To use an in-cell function, set it up to extract a single piece such as Date or everything else. The following code will extract the date. Cell B1 would have the following equation: =extractDate(A1)
Function extractDate(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strRaplace As String
Dim strOutput As String
strPattern = "(\d{1,2}.*\d{4}) (.*)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
extractDate = regEx.Replace(strInput, "$1")
Else
extractDate = "Not matched"
End If
End If
End Function
To make another function for extracting the rest of the date simply change $1 to $2 and it will return the second defined match in the pattern.