Regular expression to substitute a pattern in VB script - regex

I am trying to write a regular expression in VB script to substitute some patterns.
My string may contain zero or more following patterns -
&USERID
&USERID
&USERID.
&USERID.
&USERID(n)
&USERID(n)
&USERID(n).
&USERID(n).
&USERID(n1, n2)
&USERID(n1, n2)
&USERID(n1, n2).
&USERID(n1, n2).
Sample string -
C:\temp\&USERID_&USERID._&USERID(4)_&USERID(2,2)..txt
If USERID=ABCDEF, then once substituted the resultant string should look like -
C:\temp\ABCDEF_ABCDEF_ABCD_BC.txt
The number in the bracket denotes the number of characters to substitute. Range can be specified using comma separated numbers. In order to achieve this I wrote a regular expression as follows -
"((&USERID\(\d+,\d+\)\.)|(&USERID\(\d+,\d+\)\.)|(&USERID\(\d+,\d+\))|(&USERID\(\d+,\d+\)))|((&USERID\(\d\)\.)|(&USERID\(\d\)\.)|(&USERID\(\d\))|(&USERID\(\d\))|(&USERID\.)|(&USERID\.))"
Using VBScript.RegExp I match the pattern and obtain collection of the matches. Iterating over each match object, I substitute either the complete USERID or part of it based on subscript.
The regular expression works fine. BUT it is very slow compared to string manipulation function.
Can above pattern be optimized?
Update:
I accepted the answer which solves one of my problem. Based on the regular expression, I tried to solve another find and replace problem as follows -
I have following patterns
DATE
DATE(MMDDYYYY)
DATE(DDMMYYYY)
DATE(YYYYMMDD)
DATE(YYYY)
DATE(MM)
DATE(DD)
DATE(DDMONYYYY)
DATE(MON)
DATE(MONTH)
DATE(YYDDD)
DATE(YYYYDDD)
It may have a terminating "." at the end.
Function replaceDate(matchString, label, position, sourceString)
If label = "MMDDYYYY" or label = "" then
replaceDate = "<MMDDYYYY>"
ElseIf label = "DDMMYYYY" then
replaceDate = "<DDMMYYYY>"
ElseIf label = "YYYYMMDD" then
replaceDate = "<YYYYMMDD>"
ElseIf label = "YYYY" then
replaceDate = "<YYYY>"
ElseIf label = "MM" then
replaceDate = "<MM>"
ElseIf label = "DD" then
replaceDate = "<DD>"
ElseIf label = "DDMONYYYY" then
replaceDate = "<DDMONYYYY>"
ElseIf label = "MON" then
replaceDate = "<MON>"
ElseIf label = "MONTH" then
replaceDate = "<MONTH>"
ElseIf label = "YYDD" then
replaceDate = "<YYYYDDD>"
Else
replaceDate = ""
end if
End Function
With new RegExp
.Global = True
.IgnoreCase = False
.Pattern = "(?:&(?:amp;)?)?DATE(?:\((MMDDYYYY|DDMMYYYY|YYYYMMDD|YYYY|MM|DD|DDMONYYYY|MON|MONTH|YYDDD|YYYYDDD)?\))?\.?"
strTempValue = .Replace(strTempValue, GetRef("replaceDate"))
End with

Without more data it is not easy to test, but you can try if this performs better
Dim USERID
USERID = "ABCDEF"
Dim originalString
originalString = "C:\temp\&USERID_&USERID._&USERID(4)_&USERID(2,2)..txt"
Dim convertedString
Function replaceUSERID(matchString, n1, n2, position, sourceString)
n1 = CLng("0" & Trim(n1))
n2 = CLng("0" & Trim(Replace(n2, ",", "")))
If n1 < 1 Then
replaceUSERID = USERID
ElseIf n2 > 0 Then
replaceUSERID = Mid(USERID, n1, n2)
Else
replaceUSERID = Left(USERID, n1)
End If
End Function
With New RegExp
.Pattern = "(?:&(?:amp;)?)?USERID(?:\((\s*\d+\s*)(,\s*\d+\s*)?\))?\.?"
.Global = True
.IgnoreCase = False
convertedString = .Replace(originalString, GetRef("replaceUSERID"))
End With
WScript.Echo originalString
WScript.Echo convertedString
For a multiple "label" replacement
Option Explicit
Dim dicLabels
Set dicLabels = WScript.CreateObject("Scripting.Dictionary")
With dicLabels
.Add "USERID", "ABCDEF"
.Add "LUSER", "ABCDEF"
.Add "ID", "GHIJKL"
End With
Dim originalString
originalString = "C:\temp\&USERID_&USERID._&USERID(4)_&USERID(2,2)_ID(2,3)_&LUSER..txt"
Dim convertedString
Function replaceLabels(matchString, label, n1, n2, position, sourceString)
If Not dicLabels.Exists(label) Then
replaceLabels = matchString
Else
n1 = CLng("0" & Trim(n1))
n2 = CLng("0" & Trim(Replace(n2,",","")))
replaceLabels = dicLabels.Item(label)
If n1 > 0 Then
If n2 > 0 Then
replaceLabels = Mid(dicLabels.Item(label), n1, n2)
Else
replaceLabels = Left(dicLabels.Item(label), n1)
End If
End If
End If
End Function
With New RegExp
.Pattern = "(?:&(?:amp;)?)?("& Join(dicLabels.Keys, "|") &")(?:\((\s*\d+\s*)(,\s*\d+\s*)?\))?\.?"
.Global = True
.IgnoreCase = False
convertedString = .Replace(originalString, GetRef("replaceLabels"))
End With
WScript.Echo originalString
WScript.Echo convertedString

Related

Splitting a string and capitalizing letters based on cases

I have some column names with starting coding convention that I would like to transform, see example:
Original Target
------------- --------------
partID Part ID
completedBy Completed By
I have a function in VBA that splits the original string by capital letters:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
I wrap this function within PROPER, for example, PROPER(SplitCaps(A3)) produces the desired result for the third row but leaves the "D" in ID uncapitalized.
Original Actual
------------- --------------
partID Part Id
completedBy Completed By
Can anyone think of a solution to add cases to this function?
split the word and loop the results and test whether it is all caps before using Proper. then join them back:
Sub kjl()
Dim str As String
str = "partID"
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
str = Join(strArr, " ")
Debug.Print str
End Sub
If you want a formula to do what you are asking then:
=TEXTJOIN(" ",TRUE,IF(EXACT(UPPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)),PROPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)))))
Entered as an array formula by confirming with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Or use the code above as a Function:
Function propSplitCaps(str As String)
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
propSplitCaps = Join(strArr, " ")
End Function
and call it =propSplitCaps(A1)
Instead of using the Proper function, just capitalize the first letter of each word after you have split the string on the transition.
Option Explicit
Function Cap(s As String) As String
Dim RE As RegExp, MC As MatchCollection, M As Match
Const sPatSplit = "([a-z])([A-Z])"
Const sPatFirstLtr As String = "\b(\w)"
Const sSplit As String = "$1 $2"
Set RE = New RegExp
With RE
.Global = True
.Pattern = sPatSplit
.IgnoreCase = False
If .Test(s) = True Then
s = .Replace(s, sSplit)
.Pattern = sPatFirstLtr
Set MC = .Execute(s)
For Each M In MC
s = WorksheetFunction.Replace(s, M.FirstIndex + 1, 1, UCase(M))
Next M
End If
End With
Cap = s
End Function

Match all dates without an asterisk in front

I'm trying to use negative lookback to match all dates without an asterisk in front but it doesn't seem to be working.
(?<!\\*)(\b(?:0[1-9]|[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)
This is the string I'm trying to match:
02/02/2019 *03/20/2019 AB CART 9000341 FAXED TO INSTITUTION
Here's the full code for what I have. It extracts the most recent date preceding the word faxed. The problem is if there is a date with an asterisk in front of it (such as *03/20/2019) it chooses that instead of the date (02/02/2019)
This is the Function:
Option Explicit
Function lastFaxedDt(s As String) As Date
Dim re As RegExp, MC As MatchCollection
Const sPat As String = "(\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)(?=.*?faxed)"
Set re = New RegExp
With re
.Pattern = sPat
.IgnoreCase = True
.Global = True
If .Test(s) = True Then
Set MC = .Execute(s)
lastFaxedDt = CDate(MC(MC.Count - 1))
End If
End With
End Function
This is the Macro:
Sub ExtractDate()
marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "RFT" & "*" Then
Set IE = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Dim Text As String
Text = Trim$(IE.document.getElementById("ctl00_ContentPlaceHolder1_txtNotes").innerText)
ExtractedDate = lastFaxedDt(Text)
If ExtractedDate = "12:00:00 AM" Then
ExtractedDate = "0"
Else
End If
ExtractedDate = CLng(ExtractedDate)
MaxDate = Application.WorksheetFunction.Max(ExtractedDate)
If MaxDate = "0" Then
MsgBox "No Date Found"
Else
End If
MaxDate = CDate(MaxDate)
Dim ws5 As Worksheet: Set ws5 = ActiveWorkbook.ActiveSheet
ws5.Range("C" & (ActiveCell.Row)).Value = MaxDate
Range("C" & (ActiveCell.Row)).NumberFormat = "[$-409]d-mmm;#"
End Sub
As mentioned in the comments, VBA does not support Lookbehinds. To work around this, you can replace your Lookbehind with the following:
(?:^|[^*])
And then find the date in the capturing group (sub-match) instead of the full match. In this case, your function should look something like this:
Function lastFaxedDt(s As String) As Date
Const sPat As String = _
"(?:^|[^*])" & _
"(\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/(?:19\d{2}|[2-9]\d{3})\b)" & _
"(?=.*?faxed)"
Dim re As New RegExp, matches As MatchCollection
With re
.Pattern = sPat
.IgnoreCase = True
.Global = True
Set matches = .Execute(s)
If matches.Count > 0 Then
Dim lastMatch As Match: Set lastMatch = matches(matches.Count - 1)
lastFaxedDt = CDate(lastMatch.SubMatches.Item(0))
Else
' TODO: handle the case where no matches are found
End If
End With
End Function
Usage:
Dim s As String
s = "02/02/2019 *03/20/2019 AB CART 9000341 FAXED TO INSTITUTION"
MsgBox lastFaxedDt(s) ' 02/02/2019

VBA Find a string that has range of value in it with Regular Expression and replace with each value in that range

First of all, sorry for the long title. I just don't know how to put it succinctly. I am trying to do this in VBA as normal Excel will not cut it.
Basically, I have a column. Each cells may contain data in the format of something like
flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;
What I need is to find the string that has "-" in it, and attempt to replace it with anything in between. so the above code will become
Flat 10, Flat 11; Flat 12, Flat 14;Flat 18, Flat 19;Unit 7, Unit 8, Unit 9;Flat A, Flat B, Flat C; ABC;DEF;
With the help of this article on RegExpression, I have managed to work out how to expand the bits of data with number, which I will post the code below. However, I don't know a good way to expand the data with the letter. i.e from Flat A-C to Flat A, Flat B, Flat C
My code is below, please feel free to give any pointers if you think it can be more efficient. I am very much an amateur at this. Thank you in advance.
Sub CallRegEx()
Dim r As Match
Dim mcolResults As MatchCollection
Dim strInput As String, strPattern As String
Dim test As String, StrOutput As String, prefix As String
Dim startno As Long, endno As Long
Dim myrange As Range
strPattern = "(Flat|Unit) [0-9]+-+[0-9]+"
With Worksheets("Sheet1")
lrow = .Cells(Rows.Count, 9).End(xlUp).Row
For Each x In .Range("A2:A" & lrow)
strInput = Range("A" & x.Row).Value
Set mcolResults = RegEx(strInput, strPattern, True, , True)
If Not mcolResults Is Nothing Then
StrOutput = strInput
For Each r In mcolResults
startno = Mid(r, (InStr(r, "-") - 2), 2)
endno = Mid(r, (InStr(r, "-") + 1))
prefix = Mid(r, 1, 4)
test = ""
For i = startno To endno - 1
test = test & prefix & " " & i & ","
Next i
test = test & prefix & " " & endno
'this is because I don't want the comma at the end of the last value
StrOutput = Replace(StrOutput, r, test)
Debug.Print r ' remove in production
Next r
End If
.Range("D" & x.Row).Value = StrOutput
Next x
End With
End Sub
This function below is to support the Sub above
Function RegEx(strInput As String, strPattern As String, _
Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
Optional IgnoreCase As Boolean) As MatchCollection
Dim mcolResults As MatchCollection
Dim objRegEx As New RegExp
If strPattern <> vbNullString Then
With objRegEx
.Global = GlobalSearch
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Pattern = strPattern
End With
If objRegEx.test(strInput) Then
Set mcolResults = objRegEx.Execute(strInput)
Set RegEx = mcolResults
End If
End If
End Function
Letters have character codes that are ordinal (A < B < C ...) & these can be accessed via asc()/chr$() - here is one way to do it:
inputStr = "flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;flat 6;flat T"
Dim re As RegExp: Set re = New RegExp
re.Pattern = "(flat|unit)\s+((\d+)-(\d+)|([A-Z])-([A-Z]))"
re.Global = True
re.IgnoreCase = True
Dim m As MatchCollection
Dim start As Variant, fin As Variant
Dim tokens() As String
Dim i As Long, j As Long
Dim isDigit As Boolean
tokens = Split(inputStr, ";")
For i = 0 To UBound(tokens) '// loop over tokens
Set m = re.Execute(tokens(i))
If (m.Count) Then
With m.Item(0)
start = .SubMatches(2) '// first match number/letter
isDigit = Not IsEmpty(start) '// is letter or number?
If (isDigit) Then '// number
fin = .SubMatches(3)
Else '// letter captured as char code
start = Asc(.SubMatches(4))
fin = Asc(.SubMatches(5))
End If
tokens(i) = ""
'// loop over items
For j = start To fin
tokens(i) = tokens(i) & .SubMatches(0) & " " & IIf(isDigit, j, Chr$(j)) & ";"
Next
End With
ElseIf i <> UBound(tokens) Then tokens(i) = tokens(i) & ";"
End If
Next
Debug.Print Join(tokens, "")
flat 10;flat 11;flat 12;flat 13;flat 14;Flat 18;Flat 19;unit 7;unit 8;unit 9;flat A;flat B;flat C;flat D;ABC;DEF;flat 6;flat T

Extract text from 2 strings from selected Outlook email

I have code to import email body data from Outlook to Excel. I only need Name, ID, code from the email.
I have done everything except to extract the ID from a fixed sentence:
cn=SVCLMCH,OU=Users,OU=CX,DC=dm001,DC=corp,DC=dcsa,DC=com
The id is SVCLMCH in this case, that means I need to extract the text between "cn=" and ",OU=Users".
Sub import_code()
Dim O As Outlook.Application
Set O = New Outlook.Application
Dim ONS As Outlook.Namespace
Set ONS = O.GetNamespace("MAPI")
Dim OMAIL As Outlook.MailItem
Set OMAIL = Nothing
Dim ws As Object
Set ws = ThisWorkbook.Worksheets("Import code from Outlook")
Dim rcount As Long
Dim vText As Variant
Dim sText As String
Dim i As Long
If O.ActiveExplorer.Selection.Count = 0 Then
msgbox "No Items selected!", vbCritical, "Error"
End If
On Error Resume Next
'Process each selected record
rcount = ws.UsedRange.Rows.Count
For Each OMAIL In O.ActiveExplorer.Selection
sText = OMAIL.Body
vText = Split(sText, Chr(13))
'Find the next empty line of the worksheet
rcount = rcount + 1
'Check each line of text in the message body
For i = UBound(vText) To 0 Step -1
If InStr(1, vText(i), "Password Generated and set for:") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("A" & rcount) = Trim(vItem(1))
End If
If InStr(1, vText(i), "cn=") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("b" & rcount) = Trim(vItem(1))
End If
If InStr(1, vText(i), "Password:") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("c" & rcount) = Trim(vItem(1))
End If
Next i
Next OMAIL
End Sub
The trick here is to use the Split() function
Dim Arr() As String
Dim j As Integer
Dim k As Integer
Dim strvar As String
Dim strval As String
Dim strID As String
If InStr(1, vtext(i), "cn=") > 0 Then
' split the whole line in an array - "," beeing the value separator
Arr = Split(vtext(i), ",")
' loop through all array elements
For j = 0 To UBound(r) - 1
' find the position of =
k = InStr(Arr(j), "=")
strvar = Left(Arr(j), k - 1) ' this is the variable name, ie "cn"
strval = Mid(Arr(j), k + 1) ' this is the value, ie "SVCLMCH"
' now do what you want with a specific variable
Select Case strvar
Case "cn"
strID = strval
Case Else
' do nothing
End Select
Next j
End If
you can use a helper function like this:
Function GetID(strng As String)
Dim el As Variant
For Each el In Split(strng, ",")
If InStr(1, el, "cn=") > 0 Then
GetID = Mid(el, InStr(1, el, "cn=") + 3)
Exit Function
End If
Next
End Function
and your main code would exploit it as:
If InStr(1, vText(i), "cn=") > 0 Then ws.Range("b" & rcount) = GetID(CStr(vText(i)))
Use Regular Expression extract the ID from the sentence
Example Pattern = "(?<=cn=)([^.]+)(?=\,OU=Users)"
https://regex101.com/r/67u84s/2
Code Example
Option Explicit
Private Sub Examplea()
Dim Matches As Variant
Dim RegEx As Object
Set RegEx = CreateObject("VbScript.RegExp")
Dim olApp As Object
Set olApp = CreateObject("Outlook.Application")
Dim Item As Outlook.MailItem
Set Item = olApp.ActiveExplorer.Selection.Item(1)
Dim Pattern As String
Pattern = "(?<=cn=)([^.]+)(?=\,OU=Users)"
With RegEx
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Item.Body)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0).SubMatches(0)
With ThisWorkbook.Sheets("Sheet1")
.Range("A1").Value = Trim(Matches(0).SubMatches(0))
End With
End If
End Sub

Replace all characters in a String, unless they are within double quotes

I am sadly unfamiliar with regular expressions since I'm not a programmer, but I would guess this problem is easily solvable using regex (I am definitely open to other suggestions, though)
I want to use the split function to split the value of a cell and spread it out over multiple cells. The delimiter is a comma. The problem though is that some users use commas in comments for example, which the Split function uses to split the string mid-comment.
for example a cell containing the value:
0001,"name","address","likes apples, oranges
and plums"
needs to be split into multiple cells saying 0001 "name" "address" and "likes apples, oranges and plums".
my code splits the comment as well, and I want it to ignore the comment or everything else withing double quotes. here is a sample:
Sub SplittingStrings()
Dim wb As Workbook
Dim ws As Worksheet
Dim strInput As String
Dim counter As Integer
Dim cell As Variant
Dim splitCount As Integer
Dim splitString() As String
Dim category As Variant
Set wb = ActiveWorkbook
Set ws = wb.ActiveSheet
counter = 1
For Each cell In Range("A1", "A2000")
If cell.Value <> "" Then
strInput = cell.Value
splitCount = 2
splitString = Split(strInput, ",")
For Each category In splitString
Cells(counter, splitCount).Value = category
splitCount = splitCount + 1
Next category
End If
counter = counter + 1
Next cell
End Sub
how do I exclude stuff withing the double quotes from being considered by the split function?
Please give this a try and see if you get the desired output.
Tweak the variables if required.
Sub SplittingStringsUsingRegEx()
Dim lr As Long, c As Long
Dim Rng As Range, cell As Range
Dim RE, Match, Matches
Application.ScreenUpdating = False
lr = Cells(Rows.Count, 1).End(xlUp).Row
Set Rng = Range("A1:A" & lr)
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.Pattern = "\d+|"".+?"""
End With
c = 2
For Each cell In Rng
If RE.test(cell.Value) Then
Set Matches = RE.Execute(cell.Value)
For Each Match In Matches
Cells(cell.Row, c) = Replace(Match, """", "")
c = c + 1
Next Match
End If
c = 2
Next cell
Application.ScreenUpdating = True
End Sub
Without Regex:
We need to "protect" commas that are encapsulated with double quotes:
Sub ProtectStuff()
Dim i As Long, N As Long, v As String, v2 As String
Dim ProtectMode As Boolean, DQ As String, rep As String
Dim CH As String, arr
DQ = """"
rep = Chr(1)
N = Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To N
v = Cells(i, "A").Value
If v <> "" Then
ProtectMode = False
v2 = ""
For j = 1 To Len(v)
CH = Mid(v, j, 1)
If CH = DQ Then ProtectMode = Not ProtectMode
If CH = "," And ProtectMode Then CH = rep
v2 = v2 & CH
Next j
End If
arr = Split(v2, ",")
j = 2
For Each a In arr
Cells(i, j) = Replace(a, rep, ",")
j = j + 1
Next a
Next i
End Sub
Text to Columns will do what you want, differently than the split function.