Replace text using RegEx in Excel-VBA - regex

I have data in Excel like follows (one row here - one cell in Excel):
07 July 2015 12:02 – 14 July 2015 17:02
12 August 2015 22:02 – 01 September 2015 11:02
I want to write a macro that will delete all time info (e.g. "12:02") within a user's selection (multiple cells) to look like this:
07 July 2015 – 14 July 2015
12 August 2015 – 01 September 2015
When all "times" where similar ("00:00") this macro worked perfectly:
Sub delete_time()
Selection.Replace What:="00:00", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
End Sub
But then time-info stopped being uniform, so I decided to use RegEx. The problem is I can't find a proper way to do this on VBA. I tried this macro:
Sub delete_time()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
On Error Resume Next
RegEx.Global = True
RegEx.Pattern = "\d\d\:\d\d"
ActiveDocument.Range = _
RegEx.Replace(ActiveDocument.Range, "")
End Sub
But it didn't work. Also tried "[0-9]{2}:[0-9]{2}" and "[0-9][0-9]:[0-9][0-9]" patterns but nothing changed. So the problem must be in my misunderstanding of VBA (I'm new to it).
Can anyone help?

The problem is with your selection.
ActiveDocument.Range = _
RegEx.Replace(ActiveDocument.Range, "")
ActiveDocument doesn't exist in the Excel namespace. We have ActiveWorkbook or ThisWorkbook, but what you need now is the Selection.
Use a for each loop to iterate all the cells in the current selection like this:
Dim myCell As Range
For Each myCell In Selection.Cells
myCell.Value = RegEx.Replace(myCell.Value, "")
Next

A faster approach would be to combine your RegExp with a variant array:
'Press Alt + F11 to open the Visual Basic Editor (VBE)
'From the Menu, choose Insert-Module.
'Paste the code into the right-hand code window.
'Press Alt + F11 to close the VBE
'In Xl2003 Goto Tools … Macro … Macros and double-click KillDate
Sub KillDate()
Dim rng1 As Range
Dim rngArea As Range
Dim lngRow As Long
Dim lngCol As Long
Dim lngCalc As Long
Dim objReg As Object
Dim X()
On Error Resume Next
Set rng1 = Application.InputBox("Select range for the replacement", "User select", Selection.Address, , , , , 8)
If rng1 Is Nothing Then Exit Sub
On Error GoTo 0
'See Patrick Matthews excellent article on using Regular Expressions with VBA
Set objReg = CreateObject("vbscript.regexp")
objReg.Pattern = "\d\d\:\d\d"
objReg.Global = True
'Speed up the code by turning off screenupdating and setting calculation to manual
'Disable any code events that may occur when writing to cells
With Application
lngCalc = .Calculation
.ScreenUpdating = False
.Calculation = xlCalculationManual
.EnableEvents = False
End With
'Test each area in the user selected range
'Non contiguous range areas are common when using SpecialCells to define specific cell types to work on
For Each rngArea In rng1.Areas
'The most common outcome is used for the True outcome to optimise code speed
If rngArea.Cells.Count > 1 Then
'If there is more than once cell then set the variant array to the dimensions of the range area
'Using Value2 provides a useful speed improvement over Value. On my testing it was 2% on blank cells, up to 10% on non-blanks
X = rngArea.Value2
For lngRow = 1 To rngArea.Rows.Count
For lngCol = 1 To rngArea.Columns.Count
'replace the leading zeroes
X(lngRow, lngCol) = objReg.Replace(X(lngRow, lngCol), vbNullString)
Next lngCol
Next lngRow
'Dump the updated array sans leading zeroes back over the initial range
rngArea.Value2 = X
Else
'caters for a single cell range area. No variant array required
rngArea.Value = objReg.Replace(rngArea.Value, vbNullString)
End If
Next rngArea
'cleanup the Application settings
With Application
.ScreenUpdating = True
.Calculation = lngCalc
.EnableEvents = True
End With
Set objReg = Nothing
End Sub

The easiest approach to me seems to be to use LEFT and RIGHT functions to extract the two separate timestamps, then to convert these timestamps to dates using TEXT function. Probably easiest in excel directly, but if you want to go down VBA route then example solution below:
' Taking a random date from Cell A1
DateRange = Range("A1")
' Extracting the first timestamp
FirstTimeStamp = Left(DateRange, Application.Find(" – ", DateRange))
' Converting to required date format
FirstDate = Application.Text(FirstTimeStamp, "dd-mmm-yyyy")
LastTimeStamp = Right(DateRange, Application.Find(" – ", DateRange))
LastDate = Application.Text(LastTimeStamp, "dd-mmm-yyyy")

Function ReplaceRegEx(str As String, pattern As String, newChar As String) As String 'recherche et remplace une expression reguliere par une chaine de char
Dim regEx As Object, found As Object, counter As Integer, F As Object
Set regEx = CreateObject("VBscript.RegExp")
regEx.Global = True
regEx.ignorecase = False
regEx.pattern = pattern
Set found = regEx.Execute(str)
counter = found.Count
If counter <> 0 Then
For Each F In found
str = Replace(str, F, newChar)
Next F
End If
ReplaceRegEx = str
End Function

Related

Using RegEx in vb.net

Here is what I need to do (for clarity)
Take a PDF file (link on the bottom)
Then parse only the information under each header into a DataFridView.
I couldn't think of a way to do this (seeing as there is no native way to handle PDFs)
So my only thought was to convert it to a txt document then (somehow) take the txt from the text document and put it into the datagridview.
So, using Itextsharp I first convert the PDF to a text file; Which keeps "most" of its formatting (see link below)
This is the source for that
Dim mPDF As String = "C:\Users\Innovators World Wid\Documents\test.pdf"
Dim mTXT As String = "C:\Users\Innovators World Wid\Documents\test.txt"
Dim mPDFreader As New iTextSharp.text.pdf.PdfReader(mPDF)
Dim mPageCount As Integer = mPDFreader.NumberOfPages()
Dim parser As PdfReaderContentParser = New PdfReaderContentParser(mPDFreader)
'Create the text file.
Dim fs As FileStream = File.Create(mTXT)
Dim strategy As iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
For i As Integer = 1 To mPageCount
strategy = parser.ProcessContent(i, New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy())
Dim info As Byte() = New UTF8Encoding(True).GetBytes(strategy.GetResultantText())
fs.Write(info, 0, info.Length)
Next
fs.Close()
however I only need the "lines" of information. So everything should look like this
63 FMPC0847535411 OD119523523152105000 Aug 28, 2020 02:18 PM EXPRESS
64 FMPP0532201112 OD119523544975573000 Aug 28, 2020 02:18 PM EXPRESS
65 FMPP0532243104 OD119523557412412000 Aug 28, 2020 02:18 PM EXPRESS
66 FMPC0847516962 OD119523576945605000 Aug 28, 2020 02:18 PM EXPRESS
67 FMPC0847520947 OD119523760191783000 Aug 28, 2020 02:19 PM EXPRESS
In order to do that now I needed to use RegEx to remove everything I didn't want
here is the RegEx I Used
The RegEx is
(\d{2}\s.{14}\s.{20}\s.{3}\s\d{1,2},\s\d{4}\s\d{2}:\d{2}\s.{2}\sEXPRESS,*\s*R*e*p*l*a*c*e*m*e*n*t*\s*o*r*d*e*r*)";
Here is the code I used.
Private Sub Fixtext()
Dim regex As Regex = New Regex("\d{2}\s.{14}\s.{20}\s.{3}\s\d{1,2},\s\d{4}\s\d{2}:\d{2}\s.{2}\sEXPRESS,*\s*R*e*p*l*a*c*e*m*e*n*t*\s*o*r*d*e*r*")
Using reader As StreamReader = New StreamReader("C:\Users\Innovators World Wid\Documents\test.txt")
While (True)
Dim line As String = reader.ReadLine()
If line = Nothing Then
Return
End If
Dim match As Match = regex.Match(line)
If match.Success Then
Dim value As String = match.Groups(1).Value
Console.WriteLine(line)
End If
End While
End Using
End Sub
The results are "close" but not exactly the way I need it. In some cases they are "crammed" together and there are still parts left behind. An example would be
90 FMPC0847531898 OD119522758218348000 Aug 28, 2020 03:20 PM EXPRESS
491 FMPP0532220915 OD119522825195489000 Aug 28, 2020 03:21 PM EXPRESS
Tracking Id Forms Required Order Id RTS done on Notes492 FMPP0532194482 OD119522868525176000 Aug 28, 2020 03:21 PM EXPRESS
493 FMPP0532195684 OD119522871090000000 Aug 28, 2020 03:21 PM EXPRESS494 FMPP0532224318 OD119522895172342000 Aug 28, 2020 03:21 PM EXPRESS
the format I actually need is (again) a format I can use to import the data later into a datagridview
so for each line it needs to be
[number][ID][ID2][Date][Notes]
[number][ID][ID2][Date][Notes]
[number][ID][ID2][Date][Notes]
[number][ID][ID2][Date][Notes]
using this "Concept" This is an example of what I need (though i know this doesn't work, but something along these lines that will work)
Dim regex As Regex = New Regex("\d{2}\s.{14}\s.{20}\s.{3}\s\d{1,2},\s\d{4}\s\d{2}:\d{2}\s.{2}\sEXPRESS,*\s*R*e*p*l*a*c*e*m*e*n*t*\s*o*r*d*e*r*")
Using reader As StreamReader = New StreamReader("C:\Users\Innovators World Wid\Documents\test.txt")
While (True)
Dim line As String = reader.ReadLine()
If line = Nothing Then
Return
End If
Dim match As Match = regex.Match(line)
If match.Success Then
Dim value As String = match.Groups(1).Value
Dim s As String = value
s = s.Replace(" Tracking Id Forms Required Order Id RTS done on Notes", Nothing)
s = s.Replace("EXPRESS ", "EXPRESS")
s = s.Replace("EXPRESS", "EXPRESS" & vbCrLf)
Console.WriteLine(line)
End If
End While
End Using
Here is a "brief" explanation with files included.
Copy of the original PDF (This is the PDF being converted to .txt using itext)
I am only doing this because I can't think of a way (outside of paying for a 3rd party tool to convert a pdf to XLS)
https://drive.google.com/file/d/1iHMM_G4UBUlKaa44-Wb00F_9ZdG-vYpM/view?usp=sharing
using the above "itext method" I mentioned this is the outputted converted file
https://drive.google.com/file/d/10dgJDFW5XlhsB0_0QAWQvtimsDoMllx-/view?usp=sharing
I then use the above Regex (mentioned above) to parse out what I don't need.
however it isn't working.
So my Questions are (for "clarity")
Is this the only or best method to do what I need done? (Convert PDF to text, Remove what I don't need then input that information into a DataGridView; Or is there another , Cleaner , Better method?
(if not 1) How can I make this work? Is something wrong with my RegEx or My Logic? Am I missing something better/cleaner that someone can help me see.
(if 2 ^ Not 1) What is the best way to take the results and place them in the proper DataGridView Column.
Final Statement: It doesn't have to be this method. I will take "ANY" method that will allow me to do what I need to be done, the cleaner the better however I have to do this avoiding 3rd party libraries that are free with limitations; Paid 3rd party libraries. That leaves me with limitations. IE: PDFBox, itext,itextsharp) And this has to be able to lead me from a PDF (like the above sample) to that table information in a Datagridview or even a listview.
I will take any help and I am more then appreciative. Also I did re-Ask this question because a mod closed my original question "Stating it wasn't clear what I needed" I did try in both cases to make the question as "thorough" as possible but I do hope this is "Clearer" so it doesn't get closed abruptly.
I cheated a bit by correcting the text file. It goes a little wonky at page breaks and misses starting a new line. Perhaps you can correct that with Itextsharp or the hard to maintain regex.
I made a class to hold the data. The property names become the column headers in the DataGridView.
I read all the lines in the text file into an array. I checked the first character of the line to see if it was a digit then split the line into another array based on the space. Next I created a new Tracking object, fleshing it out with all its properties with the parameterized constructor.
Finally, I checked it the line contained a comma and added that bit of text to the notes parameter. The completed object is added to the list.
After the loop the lst is bound to the grid.
Public Class Tracking
Public Property Number As Integer
Public Property ID As String
Public Property ID2 As String
Public Property TrackDate As Date
Public Property Notes As String
Public Sub New(TNumber As Integer, TID As String, TID2 As String, TDate As DateTime, TNotes As String)
Number = TNumber
ID = TID
ID2 = TID2
TrackDate = TDate
Notes = TNotes
End Sub
End Class
Private Sub OPCode()
Dim lst As New List(Of Tracking)
Dim lines = File.ReadAllLines("C:\Users\maryo\Desktop\test.txt")
For Each line In lines
If Char.IsDigit(line(0)) Then
Dim parts = line.Split(" "c)
Dim T As New Tracking(CInt(parts(0)), parts(1), parts(2), Date.ParseExact($"{parts(3)} {parts(4)} {parts(5)} {parts(6)} {parts(7)}", "MMM d, yyyy hh:mm tt", CultureInfo.CurrentCulture), parts(8))
If line.Contains(",") Then
T.Notes &= line.Substring(line.IndexOf(","))
End If
lst.Add(T)
End If
Next
DataGridView1.DataSource = lst
End Sub
EDIT
To pinpoint the error let's try...
Private Sub OPCode()
Dim lst As New List(Of Tracking)
Dim lines = File.ReadAllLines("C:\Users\maryo\Desktop\test.txt")
For Each line In lines
If Char.IsDigit(line(0)) Then
Dim parts = line.Split(" "c)
If parts.Length < 9 Then
Debug.Print(line)
MessageBox.Show($"We have a line that does not include all fields.")
Exit Sub
End If
Dim T As New Tracking(CInt(parts(0)), parts(1), parts(2), Date.ParseExact($"{parts(3)} {parts(4)} {parts(5)} {parts(6)} {parts(7)}", "MMM d, yyyy hh:mm tt", CultureInfo.CurrentCulture), parts(8))
If line.Contains(",") Then
T.Notes &= line.Substring(line.IndexOf(","))
End If
lst.Add(T)
End If
Next
DataGridView1.DataSource = lst
End Sub
Try this regex and see if this works according to your requirement:
\b[0-9].*(FMPC|OD).*(EXPRESS|Replacement\sOrder)\b

How to convert rtf files in folder to PDF

Would anyone happen to know how to convert multiple .rtf (Rich Text File) placed in folder to PDF within R or SAS?
I cannoot install any application onto my pc so the solution should be only within the two programs I stated above.
Here is amended code generously provided by #Reeza:
bRecursive = False
sFolder = "C:\PATH"
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oWord = CreateObject("Word.Application")
oWord.Visible = True
Set oFolder = oFSO.GetFolder(sFolder)
ConvertFolder(oFolder)
oWord.Quit
Sub ConvertFolder(oFldr)
For Each oFile In oFldr.Files
If LCase(oFSO.GetExtensionName(oFile.Name)) = "rtf" Then
Set oDoc = oWord.Documents.Open(oFile.path)
Str = left(oFile,instr(1,oFile,".")-1)
oWord.ActiveDocument.SaveAs Str, 17
oDoc.Close
End If
Next
If bRecursive Then
For Each oSubfolder In oFldr.Subfolders
ConvertFolder oSubfolder
Next
End If
End Sub
The above .vbs code works but I have 50 files in folder and after about 10 converted rtf do pdf docs it just keep opening and closing remaining files in circle (it seems). Any clue? Thanks.
I tested this and it works fine on my system. Windows 7 Enterprise.
The VBS script is saved as a .vbs and then SAS can call it using an X command or %SYSEXEC.
The VBS script is:
bRecursive = False
sFolder = "C:\_LOCALDATA\temp\_rtf_test\"
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oWord = CreateObject("Word.Application")
oWord.Visible = True
Set oFolder = oFSO.GetFolder(sFolder)
ConvertFolder(oFolder)
oWord.Quit
Sub ConvertFolder(oFldr)
For Each oFile In oFldr.Files
If LCase(oFSO.GetExtensionName(oFile.Name)) = "rtf" Then
Set oDoc = oWord.Documents.Open(oFile.path)
Str = left(oFile,instr(1,oFile,".")-1)
oWord.ActiveDocument.SaveAs Str & ".pdf", 17
oDoc.Close
End If
Next
If bRecursive Then
For Each oSubfolder In oFldr.Subfolders
ConvertFolder oSubfolder
Next
End If
End Sub
Then in SAS:
%sysexec "C:\_LOCALdata\SAMPLE.VBS";

Excel data import in specific cell

I would like to list up devices and put their prices next to them.
My goal is to check different sites every week and notice trends.
This is a hobby project, I know there are sites that already do this.
For instance:
Device | URL Site 1 | Site 1 | URL Site 2 | Site 2
Device a | http://... | €40,00 | http://... | €45,00
Device b | http://... | €28,00 | http://... | €30,50
Manually, this is a lot of work (checking every week), so I thought a Macro in Excel would help. The thing is, I would like to put the data in a single cell and excel only recognises tables. Solution: view source code, read price, export price to specific cell.
I think this is all possible within Excel, but I can't quiet figure out how to read the price or other given data and how to put it in one specific cell. Can I specify coordinates in the source code, or is there a more effective way of thinking?
First of all you have to find out how does the website works. For the page you asked I have done the following:
Opened http://www.mediamarkt.de page in Chrome.
Typed BOSCH WTW 85230 in the search box, suggestion list appeared.
Pressed F12 to open developer tools and clicked Network tab.
Each time I was typing, the new request appeared (see yellow areas):
Clicked the request to examine general info:
You can see that it uses GET method and some parameters including url-encoded product name.
Clicked the Response tab to examine the data returning from the server:
You can see it is a regular JSON, full content is as follows:
{"suggestions":[{"attributes":{"energyefficiencyclass":"A++","modelnumber":"2004975","availabilityindicator":"10","customerrating":"0.00000","ImageUrl":"http://pics.redblue.de/artikelid/DE/2004975/CHECK","collection":"shop","id":"MediaDEdece2358813","currentprice":"444.00","availabilitytext":"Lieferung in 11-12 Werktagen"},"hitCount":0,"image":"http://pics.redblue.de/artikelid/DE/2004975/CHECK","name":"BOSCH WTW 85230 Kondensationstrockner mit Warmepumpentechnologie (8 kg, A++)","priority":9775,"searchParams":"/Search.ff?query=BOSCH+WTW+85230+Kondensationstrockner+mit+W%C3%A4rmepumpentechnologie+%288+kg%2C+A+%2B+%2B+%29\u0026channel=mmdede","type":"productName"}]}
Here you can find "currentprice":"444.00" property with the price.
Simplified the request by throwing out some optional parameters, it turned out that the same JSON response can be received by the URL http://www.mediamarkt.de/FACT-Finder/Suggest.ff?channel=mmdede&query=BOSCH+WTW+85230
That data was enough to built some code, assuming that first column intended for products:
Option Explicit
Sub TestMediaMarkt()
Dim oRange As Range
Dim aResult() As String
Dim i As Long
Dim sURL As String
Dim sRespText As String
' set source range with product names from column A
Set oRange = ThisWorkbook.Worksheets(1).Range("A1:A3")
' create one column array the same size
ReDim aResult(1 To oRange.Rows.Count, 1 To 1)
' loop rows one by one, make XHR for each product
For i = 1 To oRange.Rows.Count
' build up URL
sURL = "http://www.mediamarkt.de/FACT-Finder/Suggest.ff?channel=mmdede&query=" & EncodeUriComponent(oRange.Cells(i, 1).Value)
' retrieve HTML content
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sURL, False
.Send
sRespText = .responseText
End With
' regular expression for price property
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = """currentprice""\:""([\d.]+)""" ' capture digits after 'currentprice' in submatch
With .Execute(sRespText)
If .Count = 0 Then ' no matches, something going wrong
aResult(i, 1) = "N/A"
Else ' store the price to the array from the submatch
aResult(i, 1) = .Item(0).Submatches(0)
End If
End With
End With
Next
' output resultion array to column B
Output Sheets(1).Range("B1"), aResult
End Sub
Function EncodeUriComponent(strText)
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
End If
EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function
Sub Output(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1 _
)
.NumberFormat = "#"
.Value = aCells
.Columns.AutoFit
End With
End With
End Sub
Filled worksheet with some product names:
Launched the sub and got the result:
It is just the example how to retrieve a data from the website via XHR and parse a response with RegExp, I hope it helps.

VBA RegEx scraping bounced emails in MS Outlook

I'm trying to expand the functionality of some Outlook email scrapping VBA code. I get bounce-back emails on a regular basis and would like to keep track of these (for deletion) by exporting said email address to MS Excel.
The code works, to a point. I can only scrape the first email address within a typical bounce-back notification email using RegEx. The mail servers for the company that I work for aggregate emails from the same domain into one notification email. Consequently, I get multiple notification emails that contain multiple bounced emails.
How do I get RegEx to cycle through the entire notification email to gather all email addresses??? I'm a little stuck right now because — admittedly — I don't know much about RegEx and "adopted" the majority of this code...
Thank you for your help Stackoverflow!!!
Sub Extract_Invalid_To_Excel()
Dim olApp As Outlook.Application
Dim olExp As Outlook.Explorer
Dim olFolder As Outlook.MAPIFolder
Dim obj As Object
Dim stremBody As String
Dim stremSubject As String
Dim i As Long
Dim x As Long
Dim count As Long
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
Dim xlApp As Object 'Excel.Application
Dim xlwkbk As Object 'Excel.Workbook
Dim xlwksht As Object 'Excel.Worksheet
Dim xlRng As Object 'Excel.Range
Set olApp = Outlook.Application
Set olExp = olApp.ActiveExplorer
Set olFolder = olExp.CurrentFolder
'Open Excel
Set xlApp = GetExcelApp
xlApp.Visible = True
If xlApp Is Nothing Then GoTo ExitProc
Set xlwkbk = xlApp.Workbooks.Add
Set xlwksht = xlwkbk.Sheets(1)
Set xlRng = xlwksht.Range("A1")
xlRng.Value = "Bounced email addresses"
'Set count of email objects
count = olFolder.Items.count
'counter for excel sheet
i = 0
'counter for emails
x = 1
For Each obj In olFolder.Items '**Loops through selected Outlook folder**
xlApp.StatusBar = x & " of " & count & " emails completed"
stremBody = obj.Body
stremSubject = obj.Subject
If checkEmail(stremBody) = True Then '**Checks email for keywords in email
'MsgBox ("finding email: " & stremBody)
'**RegEx to find email addresses within message body
With RegEx
.Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
.IgnoreCase = True
.MultiLine = True
.Global = False
Set olMatches = .Execute(stremBody) 'Executes RegEx function
'Loop through RegEx matches
For Each match In olMatches
xlwksht.Cells(i + 2, 1).Value = match
i = i + 1
Next match
End With
'TODO: move or mark the email that had the address extracted
Else
'**To view the items that aren't being parsed uncomment the following line
'MsgBox (stremBody)
End If
x = x + 1
Next obj
xlApp.ScreenUpdating = True
MsgBox ("Invalid Email addresses are done being extracted")
ExitProc:
Set xlRng = Nothing
Set xlwksht = Nothing
Set xlwkbk = Nothing
Set xlApp = Nothing
Set emItm = Nothing
Set olFolder = Nothing
Set olNS = Nothing
Set olApp = Nothing
End Sub
Function GetExcelApp() As Object
' always create new instance
On Error Resume Next
Set GetExcelApp = CreateObject("Excel.Application")
On Error GoTo 0
End Function
Function checkEmail(ByVal Body As String) As Boolean
Dim keywords(3) As String
keywords(0) = "recipient's e-mail address was not found"
keywords(1) = "error occurred while trying to deliver this message"
keywords(2) = "message wasn't delivered"
'Default value
checkEmail = False
For Each word In keywords
If InStr(1, Body, word, vbTextCompare) > 1 Then
checkEmail = True
Exit For
End If
Next word
End Function
To provide more detail. I would receive hundreds of emails which contain the following text:
Delivery has failed to these recipients or distribution lists:
John.Doe#abc.com
The recipient's e-mail address was not found in the recipient's e-mail system. Microsoft Exchange will not try to redeliver this message for you. Please check the e-mail address and try resending this message, or provide the following diagnostic text to your system administrator.
Morgan.Freedman#abc.com
The recipient's e-mail address was not found in the recipient's e-mail system. Microsoft Exchange will not try to redeliver this message for you. Please check the e-mail address and try resending this message, or provide the following diagnostic text to your system administrator.
Michael.Jordan#abc.com
The recipient's e-mail address was not found in the recipient's e-mail system. Microsoft Exchange will not try to redeliver this message for you. Please check the e-mail address and try resending this message, or provide the following diagnostic text to your system administrator.
The code above is able to grab the first email address in the email body text (i.e. John.Doe#abc.com), but doesn't see the other two email addresses...
The rest of the code works flawlessly. It exports the email addresses that it does find into Excel.
I'm receiving up to 200 bounced email notices which each large email distribution. With Constant Contact it’s easy, because the tool processes all of the bounced addresses and codes into a nice file. With Outlook, I’m on my own, but I prefer it for other reasons.
So I came up with a procedure and VBA macro to accomplish the task. First I place all of the emails I wish to process into one folder and have it selected. Using Outlook 2010, I go to FILE -> OPTIONS -> ADVANCED -> EXPORT. From there I choose EXPORT TO A FILE (Next) and then the last option, TAB SEPARATED VALUES (Windows). You then select the name and folder location to store the one single TXT file that has combined all of the emails in the folder.
Open the file into Msft Word and run the following VBA macro:
Sub Bounced_Email_Harvester()
'
' Bounced-Email Text-Process Macro
'
Dim flag As Boolean
' DocLen is to maintain Document length in characters
Dim DocLen As Long
' Try to speed up Word by suspending unnecessary tasks
ActiveDocument.ActiveWindow.View.Draft = True
Options.Pagination = False
Options.CheckGrammarAsYouType = False
Options.CheckSpellingAsYouType = False
Application.ScreenUpdating = False
' Remove extraneous bracket characters < & >
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ">>>"
.Replacement.Text = "###"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "<<<"
.Replacement.Text = "VVV"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.WholeStory
DocLen = Len(Selection)
Application.DisplayStatusBar = True
Selection.HomeKey Unit:=wdStory
' CORE OF MACRO IS WITHIN THIS LOOP
Do While DocLen > 800
' Selects text until next # sign is reached - locating email addresses
flag = True
While flag = True
Selection.MoveRight Unit:=wdCharacter, Count:=1, Extend:=wdExtend
DocLen = DocLen - 1
If Strings.Right(Selection.Range.Text, 1) = "#" Or DocLen < 2 Then flag = False
Wend
flag = True
Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend
DocLen = DocLen + 1
While flag = True
Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend
DocLen = DocLen + 1
'Locate the Beginning of email seeking demarkations (brackets, space, tab, paragraph)
If Strings.Right(Selection.Range.Text, 1) = "<" Or Strings.Right(Selection.Range.Text, 1) = "[" Or Strings.Right(Selection.Range.Text, 1) = "(" Or Strings.Right(Selection.Range.Text, 1) = " " _
Or Strings.Right(Selection.Range.Text, 1) = Chr$(9) Or Strings.Right(Selection.Range.Text, 1) = Chr$(13) Or DocLen < 2 Then flag = False
Wend
Selection.TypeParagraph
flag = True
While flag = True
Selection.MoveRight Unit:=wdCharacter, Count:=1, Extend:=wdExtend
DocLen = DocLen - 1
'Locate the End of email seeking demarkations (brackets, space, tab, paragraph)
If Strings.Right(Selection.Range.Text, 1) = ">" Or Strings.Right(Selection.Range.Text, 1) = "]" Or Strings.Right(Selection.Range.Text, 1) = ")" Or Strings.Right(Selection.Range.Text, 1) = " " _
Or Strings.Right(Selection.Range.Text, 1) = Chr$(9) Or Strings.Right(Selection.Range.Text, 1) = Chr$(13) Or DocLen < 2 Then flag = False
Wend
Selection.Collapse Direction:=wdCollapseEnd
Selection.Previous(Unit:=wdCharacter, Count:=1).Select
DocLen = DocLen + 1
Selection.TypeParagraph
Loop
' END OF CORE MACRO LOOP
Selection.Collapse Direction:=wdCollapseEnd
Selection.Previous(Unit:=wdCharacter, Count:=1).Select
Selection.TypeParagraph
' Major work done - now some pesky house cleaning....
Selection.Find.ClearFormatting
With Selection.Find
.Text = "mailto:"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = ":550^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "<^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = ";^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "...^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = ".^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
MsgBox ("Count: " & DocLen)
End Sub
Word will churn away for perhaps 10 or 15 minutes and may seem to be locked up. I click away to another app and sometimes watch Task Manager to confirm its still working based upon the performance monitor. In the morning I'll go make coffee after starting it. It will eventually finish with a msg box providing a meaningless number. Click it to go away. It’s now done and you will have a long column of paragraph-separated email addresses. It seems to reliably extract 100% of the emails and more; for example some mail server respond with derivations of the email domain such as #us.att.com and #att.com or #jpmogan.com and #jpchase.com for the same user.
Copy the whole thing and drop into an Excel column. From here sort the list and delete the obvious chaff, the whole top 20% that are email addresses starting with numbers, the email addresses that start with envelope, header, mailer, postmaster, SMTP, X-Sender and the large repetition of your sending email address. Then run a pivot table upon it to eliminate all of the duplicates. You now have your email list to import into your dB to mark as bounced email addresses. The whole post-processing after the Word macro finishes only takes me 10 to 15 minutes. I probably spend more time than needed, because invalid email addresses will simply be ignored by my dB linkage.
The macro does not extract bounce codes, so you don’t know it it’s soft bounce (mailbox full) or a hard bounce (Recipient not Found). You could try to discern these before you drop them in the folder or you could adopt a policy to require two bounces over time before a permanent delete. Your call.
I should note that I’m NOT a VBA programmer. I learned the Basic language 40 years ago on a Commodore Computer and sometimes dabble with a few function for Msft Access. Most of my experience with VBA for Word is limited to recording macros and then using the auto-generated code to automate a few repetitive tasks. Someone that knows what they are doing can likely vastly improve on my code, but it works for me has has been a big time saver.
While being still new to the RegEx function, I blindly altered the code slightly.
I changed the RegEx.Global boolean to True and this code will work flawlessly.
With RegEx
yadda yadda yadda
.Global = True
End With
Well - thanks in any case. Hope this helps other people!!!
After much hunting I was able to come up with the following function. Some of the body text still contains invalid characters (no idea why) but overall it is about 90% correct. This function parses a passed Outlook Items collection and adds all unique email addresses (found in the body of a ReportItem) to a String List, which is written to the Immediate window at the end.
Private Sub ListEmailAddresses(outlookItems As Outlook.Items)
Dim folder As Outlook.MAPIFolder = Nothing
Try
Dim emailAddresses As New List(Of String)
If TypeOf outlookItems.Parent Is Outlook.MAPIFolder Then
folder = CType(outlookItems.Parent, Outlook.MAPIFolder)
End If
For i = 1 To outlookItems.Count
Dim objItem As Object = outlookItems(i)
Try
If TypeOf objItem Is Outlook.ReportItem Then
Dim rpt As Outlook.ReportItem = TryCast(objItem, Outlook.ReportItem)
Dim temp() As Byte = System.Text.Encoding.Unicode.GetBytes(rpt.Body.ToArray())
Dim sb As New System.Text.StringBuilder
For z As Integer = 0 To temp.Length - 1
sb.Append(Chr(temp(z)))
Next
Dim rptBody As String = sb.ToString
Dim mc As MatchCollection = Regex.Matches(rptBody, _
"([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})")
Dim results(mc.Count - 1) As String
For x As Integer = 0 To results.Length - 1
Dim emailAddr As String = ValueIfNull(mc(x).Value, "").ToLower
If Not String.IsNullOrWhiteSpace(emailAddr) Then
If Not emailAddresses.Contains(emailAddr) Then
emailAddresses.Add(emailAddr)
End If
End If
Next
End If
Catch ex As Exception
' Do Something if you care.
Finally
Marshal.ReleaseComObject(objItem)
End Try
Next
emailAddresses.Sort()
Debug.WriteLine(emailAddresses.ToSeparatedString(Environment.NewLine))
Catch ex As Exception
' Do Something if you care.
Finally
If folder IsNot Nothing Then Marshal.ReleaseComObject(folder)
End Try
End Sub

How to consume a webservice using excel? [duplicate]

In a VBA module in excel 2007, is it possible to call a web service? If so, any code snippets? How would I add the web reference?
Yes You Can!
I worked on a project that did that (see comment). Unfortunately no code samples from that one, but googling revealed these:
How you can integrate data from several Web services using Excel and VBA
STEP BY STEP: Consuming Web Services through VBA (Excel or Word)
VBA: Consume Soap Web Services
Here's an overview from MS:
Consuming Web Services in Excel 2007
For an updated answer see this SO question:
calling web service using VBA code in excel 2010
Both threads should be merged though.
In Microsoft Excel Office 2007 try installing "Web Service Reference Tool" plugin. And use the WSDL and add the web-services. And use following code in module to fetch the necessary data from the web-service.
Sub Demo()
Dim XDoc As MSXML2.DOMDocument
Dim xEmpDetails As MSXML2.IXMLDOMNode
Dim xParent As MSXML2.IXMLDOMNode
Dim xChild As MSXML2.IXMLDOMNode
Dim query As String
Dim Col, Row As Integer
Dim objWS As New clsws_GlobalWeather
Set XDoc = New MSXML2.DOMDocument
XDoc.async = False
XDoc.validateOnParse = False
query = objWS.wsm_GetCitiesByCountry("india")
If Not XDoc.LoadXML(query) Then 'strXML is the string with XML'
Err.Raise XDoc.parseError.ErrorCode, , XDoc.parseError.reason
End If
XDoc.LoadXML (query)
Set xEmpDetails = XDoc.DocumentElement
Set xParent = xEmpDetails.FirstChild
Worksheets("Sheet3").Cells(1, 1).Value = "Country"
Worksheets("Sheet3").Cells(1, 1).Interior.Color = RGB(65, 105, 225)
Worksheets("Sheet3").Cells(1, 2).Value = "City"
Worksheets("Sheet3").Cells(1, 2).Interior.Color = RGB(65, 105, 225)
Row = 2
Col = 1
For Each xParent In xEmpDetails.ChildNodes
For Each xChild In xParent.ChildNodes
Worksheets("Sheet3").Cells(Row, Col).Value = xChild.Text
Col = Col + 1
Next xChild
Row = Row + 1
Col = 1
Next xParent
End Sub
Excel 2013 Read Data from a web service and bash the JSON till you can get what you want out of it (given the JSON will always be in the same format).
This code should just work without the need for any plugins.
You will need your own free API key from the currency converter website though.
I used it to load the USD to GBP value into a cell on my sheet.
Option Explicit
Sub Test_LateBinding()
Dim objRequest As Object
Dim strUrl As String
Dim blnAsync As Boolean
Dim strResponse As String
Set objRequest = CreateObject("MSXML2.XMLHTTP")
strUrl = "https://free.currconv.com/api/v7/convert?q=USD_GBP&compact=ultra&apiKey=[MY_API_KEY]"
blnAsync = True
With objRequest
.Open "GET", strUrl, blnAsync
.SetRequestHeader "Content-Type", "application/json"
.Send
While objRequest.readyState <> 4
DoEvents
Wend
strResponse = .responsetext
End With
Dim responseArray() As String
responseArray = Split(strResponse, ":", -1)
Dim value As String
value = responseArray(1)
Dim valueArray() As String
valueArray = Split(value, "}", -1)
Dim finalValue As String
finalValue = valueArray(0)
Sheet2.Cells(22, "C") = finalValue
End Sub