Excel VBA Regex for fetching the values which has specific text - regex

I am new to the vba and trying to solve my situation wherein we recieve multiple mail like below:
we would like to create a database in excel for all the mails which are in my specific folder
Package Summary:
Client: XYZ
Price (USD): 3,000
Time: 1 Week
Project Id: 21312
and some more text......
here we would like to capture the information for Client, Price (USD), Time, Project Id.
Have tried below code which capture the information and stores in excel file.
Sub GetFromOutlook()
Dim OutlookApp As Outlook.Application
Dim OutlookNamespace As Namespace
Dim Folder As MAPIFolder
Dim OutlookMail As Variant
'Dim i As Integer
Set OutlookApp = New Outlook.Application
Set OutlookNamespace = OutlookApp.GetNamespace("MAPI")
Set Folder = OutlookNamespace.GetDefaultFolder(olFolderInbox).Folders("Dummy").Folders("New Dummy")
'i = 1
For Each OutlookMail In Folder.Items
Dim sText As String
sText = OutlookMail.Body
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim vText, vText2, vText3, vText4 As Variant
Dim i As Integer
Set Reg1 = New RegExp
' \s* = invisible spaces
' \d* = match digits
' \w* = match alphanumeric
For i = 1 To 9
With Reg1
Select Case i
Case 1
.Pattern = "(Client[:]([\w-\s]*)\s*)\n"
.Global = False
Case 2
.Pattern = "(([\d]*\,[\d]*))\s*\n"
.Global = False
Case 3
.Pattern = "(Time[:]([\w-\s]*)\s*)\n"
.Global = False
Case 4
.Pattern = "(Project Id[:]([\w-\s]*)\s*)\n"
.Global = False
End Select
End With
If Reg1.Test(sText) Then
Set M1 = Reg1.Execute(sText)
Select Case i
Case 1
For Each M In M1
vText = Trim(M.SubMatches(1))
Next
Case 2
For Each M In M1
vText2 = Trim(M.SubMatches(1))
Next
Case 3
For Each M In M1
vText3 = Trim(M.SubMatches(1))
Next
Case 4
For Each M In M1
vText4 = Trim(M.SubMatches(1))
Next
End Select
End If
Next i
Range("a1000").End(xlUp).Offset(1, 0).Value = vText
Range("b1000").End(xlUp).Offset(1, 0).Value = vText2
Range("c1000").End(xlUp).Offset(1, 0).Value = vText3
Range("d1000").End(xlUp).Offset(1, 0).Value = vText4
Next OutlookMail
Set Folder = Nothing
Set OutlookNamespace = Nothing
Set OutlookApp = Nothing
End Sub
Challenges:
Challenge 1: if the heading Price (USD) changes to Price (GBP) still its storing the value, which should not be. it should only store the value only if the matching text found.
i tried "(Price (USD) [:] ([\d]\,[\d]))\s*\n" however its not working.
Challenge 2: for Project id, value is coming with underscore as well which i am unable to exclude.
Would really appreciate if one can help me solving the above 2 challenge from my code.
or else suggest any better approach for the same.

You may use
Client:\s*(.*)[\r\n][\s\S]*?^Price \(USD\):\s*(.*)[\r\n][\s\S]*?^Time:\s*(.*)[\r\n][\s\S]*?^Project Id:\s*(\w+)
Make sure you set Reg1.Multiline = True.
See the regex demo
The Client details will be in M.SubMatches(0) (Group 1), price info will be in M.SubMatches(1) (Group 2), time details in M.SubMatches(2) (Group 3), and the project ID will be in M.SubMatches(3) (Group 4).
If you need to remove underscores from Group 4, the project ID, just use a post-processing step:
vText4 = Replace(M.SubMatches(3), "_", "")

Related

How to extract phone number and split to column in Google sheet?

I am trying to extract phone numbers in this format (123) 456-7890 and put each number in a separate column with format 1234567890 with no space, dash or parenthesis.
I am able to achieve this in Excel using VBA code below from another StackOverflow question, but not able to get it working on Google sheet
Sub ewqre()
Dim str As String, n As Long, rw As Long
Dim rgx As Object, cmat As Object, ws As Worksheet
Set rgx = CreateObject("VBScript.RegExp")
Set ws = Worksheets("Sheet4")
With rgx
.Global = True
.MultiLine = True
'phone number pattern is: ###-###-####
.Pattern = "/^(\d{3})(\d{3})(\d{4})$/"
For rw = 2 To ws.Cells(Rows.Count, "A").End(xlUp).Row
str = ws.Cells(rw, "A").Value2
If .Test(str) Then
Set cmat = .Execute(str)
'populate the worksheet with the matches
For n = 0 To cmat.Count - 1
ws.Cells(rw, Columns.Count).End(xlToLeft).Offset(0, 1) = cmat.Item(n)
Next n
End If
Next rw
End With
Set rgx = Nothing: Set ws = Nothing
End Sub
=ARRAYFORMULA(IFERROR(REGEXEXTRACT(TO_TEXT(SPLIT(
REGEXREPLACE(A2:A, "\(|\)|\-| ", ""), CHAR(10))), "\d+")))
or:
=ARRAYFORMULA(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", ""))
true array formula would be:
=ARRAYFORMULA(IFERROR(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2:A, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", "")))

Match all dates without an asterisk in front

I'm trying to use negative lookback to match all dates without an asterisk in front but it doesn't seem to be working.
(?<!\\*)(\b(?:0[1-9]|[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)
This is the string I'm trying to match:
02/02/2019 *03/20/2019 AB CART 9000341 FAXED TO INSTITUTION
Here's the full code for what I have. It extracts the most recent date preceding the word faxed. The problem is if there is a date with an asterisk in front of it (such as *03/20/2019) it chooses that instead of the date (02/02/2019)
This is the Function:
Option Explicit
Function lastFaxedDt(s As String) As Date
Dim re As RegExp, MC As MatchCollection
Const sPat As String = "(\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)(?=.*?faxed)"
Set re = New RegExp
With re
.Pattern = sPat
.IgnoreCase = True
.Global = True
If .Test(s) = True Then
Set MC = .Execute(s)
lastFaxedDt = CDate(MC(MC.Count - 1))
End If
End With
End Function
This is the Macro:
Sub ExtractDate()
marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "RFT" & "*" Then
Set IE = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Dim Text As String
Text = Trim$(IE.document.getElementById("ctl00_ContentPlaceHolder1_txtNotes").innerText)
ExtractedDate = lastFaxedDt(Text)
If ExtractedDate = "12:00:00 AM" Then
ExtractedDate = "0"
Else
End If
ExtractedDate = CLng(ExtractedDate)
MaxDate = Application.WorksheetFunction.Max(ExtractedDate)
If MaxDate = "0" Then
MsgBox "No Date Found"
Else
End If
MaxDate = CDate(MaxDate)
Dim ws5 As Worksheet: Set ws5 = ActiveWorkbook.ActiveSheet
ws5.Range("C" & (ActiveCell.Row)).Value = MaxDate
Range("C" & (ActiveCell.Row)).NumberFormat = "[$-409]d-mmm;#"
End Sub
As mentioned in the comments, VBA does not support Lookbehinds. To work around this, you can replace your Lookbehind with the following:
(?:^|[^*])
And then find the date in the capturing group (sub-match) instead of the full match. In this case, your function should look something like this:
Function lastFaxedDt(s As String) As Date
Const sPat As String = _
"(?:^|[^*])" & _
"(\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)" & _
"(?=.*?faxed)"
Dim re As New RegExp, matches As MatchCollection
With re
.Pattern = sPat
.IgnoreCase = True
.Global = True
Set matches = .Execute(s)
If matches.Count > 0 Then
Dim lastMatch As Match: Set lastMatch = matches(matches.Count - 1)
lastFaxedDt = CDate(lastMatch.SubMatches.Item(0))
Else
' TODO: handle the case where no matches are found
End If
End With
End Function
Usage:
Dim s As String
s = "02/02/2019 *03/20/2019 AB CART 9000341 FAXED TO INSTITUTION"
MsgBox lastFaxedDt(s) ' 02/02/2019

How do I filter for a specific word (map) then capture the next text up until the next space?

I am trying to get the text right after - Map in this case example it is "AVE_NMHG_I_214_4010_XML_SAT" and input that into each Map Name row within the column up until the next space character found in could end up being "AVE_I_214_4010" as another example.
this is where I'm trying to make this fit.
Dim tmp As String
Dim RegX2 As Object, Mats1 As Object
tmp = strColS
Set RegX2 = CreateObject("VBScript.RegExp")
With RegX2
.Global = True
.Pattern = "Map\d{7}"
Set Mats1 = .Execute(tmp)
End With
If (RegX2.Test(tmp)) Then
Set Mats1 = RegX2.Execute(tmp)
tmp = Mats1(0)
Else
With RegX2
.Pattern = "MAP\d{9}"
Set Mats1 = .Execute(tmp)
End With
If (RegX.Test(tmp)) Then
tmp = Mats1(0)
End If
End If
Set Mats1 = Nothing
Set RegX2 = Nothing
Note: there isn't always a Map specified and sometimes it is defined as MAP or map.
This code works by selecting the emails in outlook and then running the macro for only that selected emails.
This is an example of a subject that has the
Example Subject
RE: TS001889493 - Translation failure - Inbound - ( VEXP/ HONCE/ Document Type 214 - Map AVE_NMHG_I_214_4010_XML_SAT - Error Conditional Relationship Error in N103 (0066) [ ref:_00D50c9MW._5000z1J3cG8:ref ]
Example of body
Dear Valued Trading Partner,
We received the attached 214 transactions from Sender ID: VEXP/ Receiver ID: HONCE that failed due to Conditional Relationship Error in the N1_03 (0066).
As per the map logic, If either N103 or N104 is present, then the other is required as they are in conditional relationship with each other. But in the input file received, N104 value is missing hence the error.
Transaction Details: #4# Attached
Please correct and resend the data.
Thank you, Simon Huggs | Sass support - Basic
What happens in the #num# is that it gets the sum of all these after making a match of the "TS" ticket ID.
This is the code I have up until now
Option Explicit
Sub WritingTicketNumberAndfailuresnew()
Dim xlApp As Object
Dim xlWB As Object
Dim xlSheet As Object
Dim rCount, STicket, SticketNumber As Long
Dim bXStarted As Boolean
Dim enviro As String
Dim strPath, SSubject As String
Dim currentExplorer As Explorer
Dim Selection As Selection
Dim olItem As Outlook.MailItem
Dim obj As Object
Dim strColS, strColB, sassupport, sMailDateReceived, SFrom As String
Dim Actions1, Actions2, Actions3, Actions4 As Boolean
Dim I, cnt, email_needed As Integer
' Get Excel set up
On Error Resume Next
Set xlApp = GetObject(, "Excel.Application")
If Err <> 0 Then
Application.StatusBar = "Please wait while Excel source is opened ... "
Set xlApp = CreateObject("Excel.Application")
bXStarted = True
End If
On Error GoTo 0
'Open a specific workbook to input the data the path of the workbook under the windows user account
enviro = CStr(Environ("USERPROFILE"))
strPath = enviro & "\Documents\topthreeticket.xlsx"
Set xlWB = xlApp.Workbooks.Open(strPath)
Set xlSheet = xlWB.Sheets("Sheet1")
' Add column names
xlSheet.Range("A1") = "Email Subject"
xlSheet.Range("B1") = "Map Name"
xlSheet.Range("C1") = "Case Number"
xlSheet.Range("D1") = "No. Of Failures"
xlSheet.Range("E1") = "Date"
xlSheet.Range("F1") = "Week Number"
sassupport = "sassuport#sass.com"
On Error Resume Next
'Find the next empty line of the worksheet
rCount = xlSheet.Range("A" & xlSheet.Rows.Count).End(-4162).Row
'needed for Exchange 2016. Remove if causing blank lines.
rCount = rCount + 1
' get the values from outlook
Set currentExplorer = Application.ActiveExplorer
Set Selection = currentExplorer.Selection
For Each obj In Selection
Set olItem = obj
'collect the fields for ticket number and failure count
strColS = olItem.Subject
strColB = olItem.Body
SFrom = olItem.SenderEmailAddress
sMailDateReceived = olItem.ReceivedTime
Dim sFailures, stmp1, stmp2, stmp3 As String
Dim RegX As Object, Mats As Object
Dim Found As Boolean
' Check the number of failures from body
sFailures = "0"
stmp1 = strColB
Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Global = True
.Pattern = "#\d+#"
Set Mats = .Execute(stmp1)
End With
If (RegX.Test(stmp1)) Then
stmp2 = Mats(0)
Found = True
stmp3 = Mid(stmp2, 2, Len(stmp2) - 2)
sFailures = stmp3
Else
With RegX
.Pattern = "#d\d+"
Set Mats = .Execute(stmp1)
End With
If (RegX.Test(stmp1)) Then
stmp2 = Mats(0)
Found = True
stmp3 = Mid(stmp2, 2, Len(stmp2) - 1)
sFailures = stmp3
End If
End If
Set Mats = Nothing
Set RegX = Nothing
Dim tmp As String
Dim RegX2 As Object, Mats1 As Object
tmp = strColS
Set RegX2 = CreateObject("VBScript.RegExp")
With RegX2
.Global = True
.Pattern = "Map\d{7}"
Set Mats1 = .Execute(tmp)
End With
If (RegX2.Test(tmp)) Then
Set Mats1 = RegX2.Execute(tmp)
tmp = Mats1(0)
Else
With RegX2
.Pattern = "MAP\d{9}"
Set Mats1 = .Execute(tmp)
End With
If (RegX.Test(tmp)) Then
tmp = Mats1(0)
End If
End If
Set Mats1 = Nothing
Set RegX2 = Nothing
Dim tempticketnum, tmpdate As String
Dim ticketnumposition As Integer
'write them in the excel sheet
If SFrom = sassupport Then
xlSheet.Range("A" & rCount) = strColS
xlSheet.Range("B" & rCount) = tmp2
xlSheet.Range("C" & rCount) = tmp
xlSheet.Range("D" & rCount) = sFailures ' number of failures
xlSheet.Range("E" & rCount) = sMailDateReceived
rCount = rCount + 1
End If
Next
Set olItem = Nothing
Set obj = Nothing
Set currentExplorer = Nothing
Set xlSheet = Nothing
Set xlWB = Nothing
Set xlApp = Nothing
End Sub
To extract the substring as you specify:
.ignorecase = True
.pattern = "map\s*(\S+)"
or
.pattern = "\bmap\s*(\S+)"
The substring will be in capturing group 1
If there is no map then the .test(..) line will return False
Regex Explained
\bmap\s*(\S+)
Options: Case insensitive; ^$ don’t match at line breaks
Assert position at a word boundary \b
Match the character string “map” literally map
Match a single character that is a “whitespace character” \s*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the regex below and capture its match into backreference number 1 (\S+)
Match a single character that is NOT a “whitespace character” \S+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Created with RegexBuddy

VBA - Modify sheet naming from source file

I received help in the past for an issue regarding grabbing a source file name and naming a newly created worksheet the date from said source file name, i.e. "010117Siemens Hot - Cold Report.xls" and outputting "010117".
However the code only works for file names with this exact format, for example, file named "Siemens Hot - Cold Report 010117.xls", an error occurs because the newly created sheet does not find the date in the source file.
CODE
Application.ScreenUpdating = False
Dim n As Double
Dim wksNew As Excel.Worksheet
Dim src As Workbook
Set src = Workbooks.Open(filePath, False, False)
Dim srcRng As Range
With src.Worksheets("Sheet1")
Set srcRng = .Range(.Range("A1"), .Range("A1").End(xlDown).End(xlToRight))
End With
With ThisWorkbook
Set wksNew = .Worksheets.Add(After:=.Worksheets(.Sheets.Count))
n = .Sheets.Count
.Worksheets(n).Range("A1").Resize(srcRng.Rows.Count, srcRng.Columns.Count).Value = srcRng.Value
End With
' ======= get the digits part from src.Name using a RegEx object =====
' RegEx variables
Dim Reg As Object
Dim RegMatches As Variant
Set Reg = CreateObject("VBScript.RegExp")
With Reg
.Global = True
.IgnoreCase = True
.Pattern = "\d{0,9}" ' Match any set of 0 to 9 digits
End With
Set RegMatches = Reg.Execute(src.Name)
On Error GoTo CloseIt
If RegMatches.Count >= 1 Then ' make sure there is at least 1 match
ThisWorkbook.Worksheets(n).Name = RegMatches(0) ' rename "Sheet2" to the numeric part of the filename
End If
src.Close False
Set src = Nothing
So, my question is, how can I get my code to recognize the string of digits no matter its position in the file name?
Code
^\d{0,9}\B|\b\d{0,9}(?=\.)
Usage
I decided to make a function that can be called inside a cell as such: =GetMyNum(x) where x is a pointer to a cell (i.e. A1).
To get the code below to work:
Open Microsoft Visual Basic for Applications (ALT + F11)
Insert a new module (right click in the Project Pane and select Insert -> Module).
Click Tools -> References and find Microsoft VBScript Regular Expressions 5.5, enable it and click OK
Now copy/paste the following code into the new module:
Option Explicit
Function GetMyNum(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
Dim match As Object
strPattern = "^\d{0,9}\B|\b\d{0,9}(?=\.)"
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set match = regEx.Execute(strInput)
GetMyNum = match.Item(0)
Else
GetMyNum = ""
End If
End If
End Function
Results
Input
A1: Siemens Hot - Cold Report 010117.xls
A2: 010117Siemens Hot - Cold Report.xls
B1: =GetMyNum(A1)
B2: =GetMyNum(A1)
Output
010117 # Contents of B1
010117 # Contents of B2
Explanation
I will explain each regex option separately. You can reorder the options in terms of importance in such a way that the most important option is first and least important is last.
^\d{0,9}\B Match the following
^ Assert position at the start of the line
\d{0,9} Match any digit 0-9 times
\B Ensure position does not match where a word boundary matches (this is used but may be dropped depending on usage - I added it because it seems the number you're trying to get is immediately followed by a word character and not followed by a space - if that's not always the case just remove this token)
\b\d{0,9}(?=\.) Match the following
\b Assert position as a word boundary
\d{0,9} Match any digit 0-9 times
(?=\.) Positive lookahead ensuring a literal dot . follows
Just my alternative solution to RegEx :)
This finds the first occurence of 6 consecutive digits, omitting blanks and periods... although there are probably some more issues with using IsNumeric as I believe a lowercase e is considered acceptable by it...
Sub FindTheNumber()
For i = 1 To Len(Range("A1").Value)
If IsNumeric(Mid(Range("A1").Value, i, 6)) = True And InStr(Mid(Range("A1").Value, i, 6), " ") = 0 And InStr(Mid(Range("A1").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A1").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
For i = 1 To Len(Range("A2").Value)
If IsNumeric(Mid(Range("A2").Value, i, 6)) = True And InStr(Mid(Range("A2").Value, i, 6), " ") = 0 And InStr(Mid(Range("A2").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A2").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
End Sub
Examples:
Immediate window:

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3