Find '~XX~' within a string with specific values - regex

I have classic ASP written in VBScript. I have a record pulled from SQL Server and the data is a string. In this string, I need to find text enclosed in ~12345~ and I need to replace with very specific text. Example 1 would be replaced with M, 2 would be replaced with A. I then need to display this on the web page. We don't know how many items will be enclosed with ~.
Example Data:
Group Pref: (To be paid through WIT)
~2.5~ % Quarterly Rebate - Standard Commercial Water Heaters
Display on webpage after:
Group Pref: (To be paid through WIT)
~A.H~ % Quarterly Rebate - Standard Commercial Water Heaters
I tried this following, but there are two many cases and this would be unrealistic to maintain. I does replace the text and display correctly.
dim strSearchThis
strSearchThis =(rsResults("PREF"))
set re = New RegExp
with re
.global = true
.pattern = "~[^>]*~"
strSearchThis = .replace(strSearchThis, "X")
end with
I am also trying this code, I can find the text contained between each ~ ~, but when displayed its the information between the ~ ~ is not changed:
dim strSearchThis
strSearchThis =(rsResults("PREF"))
Set FolioPrefData = New RegExp
FolioPrefData.Pattern = "~[^>]*~"
FolioPrefData.Global = True
FolioPrefData.IgnoreCase = True
'will contain all found instances of ~ ~'
set colmatches = FolioPrefData.Execute(strSearchThis)
Dim itemLength, found
For Each objMatch in colMatches
Select Case found
Case "~"
'ignore - doing nothing'
Case "1"
found = replace(strSearchThis, "M")
End Select
Next
response.write(strSearchThis)

You can do it without using Regular Expressions, just checking the individual characters and writing a function that handles the different cases you have. The following function finds your delimited text and loops through all characters, calling the ReplaceCharacter function defined further down:
Function FixString(p_sSearchString) As String
Dim iStartIndex
Dim iEndIndex
Dim iIndex
Dim sReplaceString
Dim sReturnString
sReturnString = p_sSearchString
' Locate start ~
iStartIndex = InStr(sReturnString, "~")
Do While iStartIndex > 0
' Look for end ~
iEndIndex = InStr(iStartIndex + 1, sReturnString, "~")
If iEndIndex > 0 Then
sReplaceString = ""
' Loop htrough all charatcers
For iIndex = iStartIndex + 1 To iEndIndex - 1
sReplaceString = sReplaceString & ReplaceCharacter(Mid(sReturnString, iIndex, 1))
Next
' Replace string
sReturnString = Left(sReturnString, iStartIndex) & sReplaceString & Mid(sReturnString, iEndIndex)
' Locate next ~
iStartIndex = InStr(iEndIndex + 1, sReturnString, "~")
Else
' End couldn't be found, exit
Exit Do
End If
Loop
FixString = sReturnString
End Function
This is the function where you will enter the different character substitutions you might have:
Function ReplaceCharacter(p_sCharacter) As String
Select Case p_sCharacter
Case "1"
ReplaceCharacter = "M"
Case "2"
ReplaceCharacter = "A"
Case Else
ReplaceCharacter = p_sCharacter
End Select
End Function
You can use this in your existing code:
response.write(FixString(strSearchThis))

You can also use a Split and Join method...
Const SEPARATOR = "~"
Dim deconstructString, myOutputString
Dim arrayPointer
deconstructString = Split(myInputString, SEPARATOR)
For arrayPointer = 0 To UBound(deconstructString)
If IsNumeric(deconstructString(arrayPointer)) Then
'Do whatever you need to with your value...
End If
Next 'arrayPointer
myOutputString = Join(deconstructString, "")
This does rely, obviously, on breaking a string apart and rejoining it, so there is a sleight overhead on string mutability issues.

Related

Excel VBA RegEx that extracts numbers from price values in range (has commas, $ and -)

I have a field data extracted from a database which represents a range of values, but it's coming in Excel as a String format $86,000 - $162,000.
I need to extract the minimum value and the maximum value from each cell, so I need to extract the numeric portion of it, and ignore the $, - and the ,.
I've attached an image of the data I have, and the values I want to extract from it.
This is the closest pattern I got with RegEx, but I'ts not what I'm looking for.
Pattern = (\d+)(?:\.(\d{1,2}))?
Can anyone assist ?
Just wondering why Regex?
Function GetParts(priceRange As String) As Double()
Dim arr() As String
Dim parts() As Double
If InStr(1, priceRange, "-") > 0 Then
arr = Split(priceRange, "-")
ReDim parts(0 To UBound(arr))
Dim i As Long
For i = 0 To UBound(arr)
parts(i) = CDbl(Replace$(Replace$(Trim$(arr(i)), "$", ""), ",", ""))
Next i
End If
GetParts = parts
End Function
Sub test()
MsgBox GetParts("$14,000 - $1,234,567")(0) 'Minimum
End Sub
EDIT
Yet you could do this with regex to match the data string into the parts:
Function GetPartsRegEx(priceRange As String) As Variant
Dim arr() As Double
Dim pricePattern As String
pricePattern = "(\$?\d+[\,\.\d]*)"
'START EDIT
Static re As RegExp
If re Is Nothing Then
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = pricePattern & "\s*[\-]\s*" & pricePattern 'look for the pattern first
End If
Static nums As RegExp
If nums Is Nothing Then
Set nums = New RegExp
'to remove all non digits, except decimal point in case you have pennies
nums.Pattern = "[^0-9.]"
nums.Global = True
End If
'END EDIT
If re.test(priceRange) Then
ReDim arr(0 To 1) ' fill return array
arr(0) = CDbl(nums.Replace(re.Replace(priceRange, "$1"), ""))
arr(1) = CDbl(nums.Replace(re.Replace(priceRange, "$2"), ""))
Else
'do some error handling here
Exit Function
End If 'maybe throw error if no +ve test or
GetPartsRegEx = arr
End Function
Sub test()
MsgBox GetPartsRegEx("$1,005.45 - $1,234,567.88")(1)
End Sub
Here is quick Example Demo https://regex101.com/r/RTNlVF/1
Pattern "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
Option Explicit
Private Sub Example()
Dim RegExp As New RegExp
Dim Pattern As String
Dim CelValue As String
Dim rng As Range
Dim Cel As Range
Set rng = ActiveWorkbook.Sheets("Sheet1" _
).Range("A2", Range("A9999" _
).End(xlUp))
For Each Cel In rng
DoEvents
Pattern = "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
If Pattern <> "" Then
With RegExp
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = Pattern
End With
If RegExp.Test(Cel.Value) Then
' Debug.Print Cel.Value
Debug.Print RegExp.Replace(CStr(Cel), "$1")
Debug.Print RegExp.Replace(CStr(Cel), "$2")
End If
End If
Next
End Sub
Without a loop (but still no regex):
Sub Split()
With Columns("B:B")
.Replace What:="$", Replacement:=""
Application.CutCopyMode = False
.TextToColumns Destination:=Range("B1"), DataType:=xlDelimited, Other:=True, OtherChar:="-", FieldInfo:=Array(Array(1, 1), Array(2, 1))
End With
Columns("B:C").Insert Shift:=xlToRight
Columns("D:E").NumberFormat = "0"
Range("D1").FormulaR1C1 = "Min Value"
Range("E1").FormulaR1C1 = "Max Value"
With Range("D1:E1").Interior
.Pattern = xlSolid
.PatternColorIndex = xlAutomatic
.Color = 12611584
End With
With Range("D1:E1").Font
.ThemeColor = xlThemeColorDark1
.TintAndShade = 0
End With
End Sub
I made this function:
Hope it helps.
Code:
Function ExtractNumber(ByVal TextInput As String, _
Optional ByVal Position As Byte = 0, _
Optional ByVal Delimiter As String = "-") As Variant
' You can use this function in a subprocess that
' writes the values in the cells you want, or
' you can use it directly in the ouput cells
' Variables
Dim RemoveItems(2) As String
Dim Aux As Variant
' The variable RemoveItems is an array
' containing the characters you want to remove
RemoveItems(0) = "."
RemoveItems(1) = ","
RemoveItems(2) = " "
' STEP 1 - The variable Aux will store the text
' given as input
Aux = TextInput
' STEP 2 - Characters stored in the variable
' RemoveItems will be removed from Aux
For i = 0 To UBound(RemoveItems)
Aux = Replace(Aux, RemoveItems(i), "")
Next i
' STEP 3 - Once Aux is "clean", it will be
' transformed into an array containing the
' values separated by the delimiter
' As you can see at the function's header,
' Delimiter default value is "-". You can change
' it depending on the situation
Aux = Split(Aux, Delimiter)
' STEP 4 - The result of this function will be
' a numeric value. So, if the value of the
' selected position in Aux is not numeric it will
' remove the first character assuming it is a
' currency symbol.
' If something fails in the process the function
' will return "ERROR", so you can know you may
' verify the inputs or adjust this code for
' your needs.
On Error GoTo ErrHndl
If Not IsNumeric(Aux(Position)) Then
ExtractNumber = CLng(Mid(Aux(Position), 2))
Else
ExtractNumber = CLng(Aux(Position))
End If
Exit Function
ErrHndl:
ExtractNumber = "ERROR"
End Function
You can even do this with just worksheet formulas. Under certain circumstances, Excel will ignore the $ and ,. The double unary converts the returned string to a numeric value.
First Value: =--LEFT(A1,FIND("-",A1)-1)
Second Value: =--MID(A1,FIND("-",A1)+1,99)

VBA - Modify sheet naming from source file

I received help in the past for an issue regarding grabbing a source file name and naming a newly created worksheet the date from said source file name, i.e. "010117Siemens Hot - Cold Report.xls" and outputting "010117".
However the code only works for file names with this exact format, for example, file named "Siemens Hot - Cold Report 010117.xls", an error occurs because the newly created sheet does not find the date in the source file.
CODE
Application.ScreenUpdating = False
Dim n As Double
Dim wksNew As Excel.Worksheet
Dim src As Workbook
Set src = Workbooks.Open(filePath, False, False)
Dim srcRng As Range
With src.Worksheets("Sheet1")
Set srcRng = .Range(.Range("A1"), .Range("A1").End(xlDown).End(xlToRight))
End With
With ThisWorkbook
Set wksNew = .Worksheets.Add(After:=.Worksheets(.Sheets.Count))
n = .Sheets.Count
.Worksheets(n).Range("A1").Resize(srcRng.Rows.Count, srcRng.Columns.Count).Value = srcRng.Value
End With
' ======= get the digits part from src.Name using a RegEx object =====
' RegEx variables
Dim Reg As Object
Dim RegMatches As Variant
Set Reg = CreateObject("VBScript.RegExp")
With Reg
.Global = True
.IgnoreCase = True
.Pattern = "\d{0,9}" ' Match any set of 0 to 9 digits
End With
Set RegMatches = Reg.Execute(src.Name)
On Error GoTo CloseIt
If RegMatches.Count >= 1 Then ' make sure there is at least 1 match
ThisWorkbook.Worksheets(n).Name = RegMatches(0) ' rename "Sheet2" to the numeric part of the filename
End If
src.Close False
Set src = Nothing
So, my question is, how can I get my code to recognize the string of digits no matter its position in the file name?
Code
^\d{0,9}\B|\b\d{0,9}(?=\.)
Usage
I decided to make a function that can be called inside a cell as such: =GetMyNum(x) where x is a pointer to a cell (i.e. A1).
To get the code below to work:
Open Microsoft Visual Basic for Applications (ALT + F11)
Insert a new module (right click in the Project Pane and select Insert -> Module).
Click Tools -> References and find Microsoft VBScript Regular Expressions 5.5, enable it and click OK
Now copy/paste the following code into the new module:
Option Explicit
Function GetMyNum(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
Dim match As Object
strPattern = "^\d{0,9}\B|\b\d{0,9}(?=\.)"
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set match = regEx.Execute(strInput)
GetMyNum = match.Item(0)
Else
GetMyNum = ""
End If
End If
End Function
Results
Input
A1: Siemens Hot - Cold Report 010117.xls
A2: 010117Siemens Hot - Cold Report.xls
B1: =GetMyNum(A1)
B2: =GetMyNum(A1)
Output
010117 # Contents of B1
010117 # Contents of B2
Explanation
I will explain each regex option separately. You can reorder the options in terms of importance in such a way that the most important option is first and least important is last.
^\d{0,9}\B Match the following
^ Assert position at the start of the line
\d{0,9} Match any digit 0-9 times
\B Ensure position does not match where a word boundary matches (this is used but may be dropped depending on usage - I added it because it seems the number you're trying to get is immediately followed by a word character and not followed by a space - if that's not always the case just remove this token)
\b\d{0,9}(?=\.) Match the following
\b Assert position as a word boundary
\d{0,9} Match any digit 0-9 times
(?=\.) Positive lookahead ensuring a literal dot . follows
Just my alternative solution to RegEx :)
This finds the first occurence of 6 consecutive digits, omitting blanks and periods... although there are probably some more issues with using IsNumeric as I believe a lowercase e is considered acceptable by it...
Sub FindTheNumber()
For i = 1 To Len(Range("A1").Value)
If IsNumeric(Mid(Range("A1").Value, i, 6)) = True And InStr(Mid(Range("A1").Value, i, 6), " ") = 0 And InStr(Mid(Range("A1").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A1").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
For i = 1 To Len(Range("A2").Value)
If IsNumeric(Mid(Range("A2").Value, i, 6)) = True And InStr(Mid(Range("A2").Value, i, 6), " ") = 0 And InStr(Mid(Range("A2").Value, i, 6), ".") = 0 Then
MyNumber = Mid(Range("A2").Value, i, 6)
Debug.Print MyNumber
Exit For
End If
Next i
End Sub
Examples:
Immediate window:

Slight adaptation of a User Defined Function

I would like to extract a combination of text and numbers from a larger string located within a column within excel.
The constants I have to work with is that each Text string will
•either start with a A, C or S, and
•will always be 7 Characters long
•the position of he string I would like to extract varies
The code I have been using which has been working efficiently is;
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(r.Text, " ")
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
However today I have learnt that sometimes my data may include scenarios like this;
What I would like is to adapt my code so If the 8th character is "Underscore" and the 1st character of the 7 characters is either S, A or C please extract up until the "Underscore"
Secondly I would like to exclude commons words like "Support" & "Collect" from being extracted.
Finally the 7th letter should be a number
Any ideas around this would be much appreciated.
Thanks
try this
ary = Split(Replace(r.Text, "_", " "))
or
ary = Split(Replace(r.Text, "_", " ")," ")
result will be same for both variants
test
update
Do you know how I could leave the result blank if the 7th character returned a letter?
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(Replace(r.Text, "_", " "))
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" And IsNumeric(Mid(a, 7, 1)) Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
test
Add Microsoft VBScript Regular Expressions 5.5 to project references. Use the following code to test matching and extracting with Xtractor:
Public Function Xtractor(ByVal p_val As String) As String
Xtractor = ""
Dim ary As String, v_re As New VBScript_RegExp_55.RegExp, Matches
v_re.Pattern = "^([SAC][^_]{1,6})_?"
Set Matches = v_re.Execute(p_val)
If Matches.Count > 0 Then Xtractor = Matches(0).SubMatches(0) Else Xtractor = ""
End Function
Sub test_Xtractor(p_cur As Range, p_val As String, p_expected As String)
Dim v_cur As Range, v_res As Range
p_cur.Value = p_val
Set v_cur = p_cur.Offset(columnOffset:=1)
v_cur.FormulaR1C1 = "='" & ThisWorkbook.Name & "'!Xtractor(RC[-1])"
Set v_res = v_cur.Offset(columnOffset:=1)
v_res.FormulaR1C1 = "=RC[-1]=""" & p_expected & """"
Debug.Print p_val; "->"; v_cur.Value; ":"; v_res.Value
End Sub
Sub test()
test_Xtractor ActiveCell, "A612002_MDC_308", "A612002"
test_Xtractor ActiveCell.Offset(1), "B612002_MDC_308", ""
test_Xtractor ActiveCell.Offset(2), "SUTP038_MDC_3", "SUTP038"
test_Xtractor ActiveCell.Offset(3), "KUTP038_MDC_3", ""
End Sub
Choose the workbook and cell for writing test fixture, then run test from the VBA Editor.
Output in the Immediate window (Ctrl+G):
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
UPD
Isit possible to ammend this code so if the 7th character is a letter to return blank?
Replace line with assign to v_re by the following:
v_re.Pattern = "^([SAC](?![^_]{5}[A-Z]_?)[^_]{1,6})_?"
v_re.IgnoreCase = True
And add to the test suite:
test_Xtractor ActiveCell.Offset(4), "SUTP03A_MDC_3", ""
Output:
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
SUTP03A_MDC_3->:True
I inserted negative lookahead subrule (?![^_]{5}[A-Z]_?) to reject SUTP03A_MDC_3. But pay attention: the rejecting rule is applied exactly to the 7th character. Now v_re.IgnoreCase set to True, but if only capitalized characters are allowed, set it to False. See also Regular Expression Syntax on MSDN.

VBA and RegEx matching arbitrary strings in Excel 2010

I need to extract adress and potentially zip code as separate entites from the same line. The address line may or may not contain a zip code, and may or may not contain other unwanted strings. This is due to a bug in a web form, which is fixed, but the damage is already done to a set of elements.
Possible forms and results:
Address: Some address 251, 99302 Something Telephone: 555 6798 8473 -- Return "some address 251" and "99302 something" in separate strings. Comma may or may not be trailed by whitespace.
Address: Some address 251 -- Return "some address 251"
Address: Some address 251, 99302 -- Return "some address 251" and "99302". Again, comma may or may not be trailed by whitespace.
I have a basic understanding of how this could be done programatically in VBA by iterating over the string and checking individual characters and substrings, but I feel like it will be time-consuming and not very robust afterwards. Or if it's robust, it would end up being huge because of all the possible variations.
I am struggling the most with how to form the regular expression(s) and possibly the conditionals to get the desired results.
This is part of a larger project, so I won't paste all the various code, but I am pulling mailitems from Outlook to analyze and dump relevant info into an Excel sheet. I have both the Outlook and Excel code working, but the logic that extracts information is a bit flawed.
Here are the new snippets I've been working on:
Function regexp(str As String, regP As String)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
regexp = rMatch(0)
Else
RegEx = vbNullString
Debug.Print "No match found!"
End If
End Function
Sub regexpAddress(str As String)
Dim result As String
Dim pattern As String
If InStr(str, "Telephone:") Then pattern = "/.+?(?=Telephone:)/"
result = regexp(str, pattern)
End Sub
I'm not sure how to form the regexps here. The one outlined should pull the right information (in 1 string instead 2, but that's still an improvement) - but only when the line contains the string "Telephone:", and I have a lot of cases where it won't contain that.
This is the current and somewhat flawed logic, which for some reason doesn't always yield the results I want:
For Each objMail In olFolder.Items
name = ""
address = ""
telephone = ""
email = ""
vIterations = vIterations + 1
arrBody = Split(objMail.body, Chr(10)) ' Split mail body when linebreak is encountered, throwing each line into its own array position
For i = 0 To UBound(arrBody)
arrLine = Split(arrBody(i), ": ") ' For each element (line), make new array, and if text search matches then write the 2nd half of the element to variable
If InStr(arrBody(i), "Name:") > 0 Then ' L2
name = arrLine(1) ' Reference 2nd column in array after the split
ElseIf InStr(arrBody(i), "Address:") > 0 Then
address = arrLine(1)
ElseIf InStr(arrBody(i), "Telephone:") > 0 Then
telephone = CLng(arrLine(1))
ElseIf InStr(arrBody(i), "Email:") > 0 Then
email = arrLine(1)
End If ' L2
Next
Next ' Next/end-for
This logic accepts and formats input of the following type:
Name: Joe
Address: Road
Telephone: 55555555555555
Email: joe#road.com
and returns joe, road, 55555 and joe#road.com to some defined Excel cells. This works fine when the mailitems are ordered as expected.
Problem: A bug lead to not my webform not inserting a linebreak after the address in some cases. The script still worked for the most part, but the mailitem contents sometimes ended up looking like this:
Name: Joe
Address: Road Telephone: 55555555555555
Email: joe#road.com
The address field was contaminated when it reached Excel ("Road Telephone" instead of just "Road"), but there was no loss of information. Which was acceptable, as it's easy to remove the surpluss string.
But in the following case (no email is entered), the phone number is not only lost but is actually replaced by a phone number from some other, arbitrary mailitem and I can't FOR THE LIFE OF ME figure out (1) why it won't get the correct number, (2) why it jumps to a new mail item to find the phone number or (3) how it selects this other mailitem:
Name: Joe
Address: Road Telephone: 5555555555555
Email:
In Excel:
Name: Joe
Address: Road Telephone
Telephone: 8877445511
Email:
So, TL;DR: my selection logic is flawed, and being that it is so hastily hacked together, not to mention how it yields false information and I am unable to figure out how and why, I would like to do a better operation using some other solution (like regexp?) instead for a more robust code.
Not so long ago I had a similar problem.
Code may not be very professional, but it can be helpful :)
Could you check if this code work for you correctly?
Function regexp(str As String, regP As String)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = False
.MultiLine = False
.IgnoreCase = True
.pattern = regP
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
regexp = rMatch(0)
Else
RegEx = vbNullString
Debug.Print "No match found!"
End If
End Function
Function for_vsoraas()
For Each objMail In olFolder.Items
vIterations = vIterations + 1
objMail_ = Replace(objMail.body, Chr(10), " ")
Dim StringToSearch(3) As String
StringToSearch(0) = "Name:"
StringToSearch(1) = "Address:"
StringToSearch(2) = "Telephone:"
StringToSearch(3) = "Email:"
Dim ArrResults(4) As String 'name,address,telephone,email, zipcode
For i = 0 To UBound(StringToSearch)
ResultString = ""
StartString = InStr(objMail_, StringToSearch(i))
If StartString > 0 Then
If i = UBound(StringToSearch) Then 'last string to search, dont search EndString
ResultString = Right(objMail_, Len(objMail_) + Len(StringToSearch(i)))
Else
EndString = 0
j = i
While (EndString = 0) 'prevent case no existing EndString
EndString = InStr(objMail_, StringToSearch(j + 1))
j = j + 1
If j = UBound(StringToSearch) And EndString = 0 Then
EndString = Len(objMail_) + 1
End If
Wend
ResultString = Mid(objMail_, StartString + Len(StringToSearch(i)) + 1, EndString - 1 - StartString - Len(StringToSearch(i)))
End If
ArrResults(i) = ResultString
End If
Next i
'search zipcode and address
ArrResults(4) = regexp(ArrResults(1), "\b(\d{5})\b")
ArrResults(1) = regexp(ArrResults(1), "([a-z ]{2,}\s{0,1}\d{0,3})")
'your varabile
Name = ArrResults(0)
Address = ArrResults(1)
Telephone = ArrResults(2)
Email = ArrResults(3)
ZipCode = ArrResults(4)
Next ' Next/end-for
End Function
I don't know if it was dumb luck or if I actually managed to learn some regex, but these patterns turn out to do exactly what I need.
' regex patterns - use flag /i
adrPattern = "([a-z ]{2,}\s{0,1}\d{0,3})" ' Select from a-z or space, case insensitive and at least 2 characters long, followed by optional space, ending with 0-3 digits
adrZipcode = "\b(\d{4})\b" ' Exactly 4 digits surrounded on both sides by either space, text or non-word character like comma
Edit: "Fixed" the telephone problem too. After spending 2 hours trying to write it in regex, and failing miserably, it dawned on me that solving the problem as a matter of faulty creation of the array had to be so much easier than treating it as a computational problem. And it was:
mailHolder = Replace(objMail.body, "Telephone:", Chr(10) + "Telephone:")
arrBody = Split(mailHolder, Chr(10))

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function