How to use multiple patterns within one regex object? - regex

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.
My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?
This is the script (working one):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
Reference to add to the library to execute the above script:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions

You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the regex demo.
Pattern details:
(Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
:\s* - a colon and then 0+ whitespaces
(\+?[\w\s]*\w) - Capturing group 2:
\+? - an optional +
[\w\s]* - 0 or more letters, digits, _ or whitespaces
\w - a single letter, digit or _.

Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.
Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.

You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.
If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.
Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.
Unless the data you are searching is related or depends within each other, I wouldn't go to that route.

I think the following can help do the same declaring rxp once:
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim Http As New XMLHTTP60, rxp As New RegExp
With Http
.Open "GET", Url, False
.send
End With
With rxp
.Pattern = "Company Name:(\s[\w\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[A1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Phone:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[B1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Fax:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[C1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
End With
End Sub

Related

how to match multi parts by regular expression in VBA? [duplicate]

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.
My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?
This is the script (working one):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
Reference to add to the library to execute the above script:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the regex demo.
Pattern details:
(Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
:\s* - a colon and then 0+ whitespaces
(\+?[\w\s]*\w) - Capturing group 2:
\+? - an optional +
[\w\s]* - 0 or more letters, digits, _ or whitespaces
\w - a single letter, digit or _.
Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.
Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.
You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.
If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.
Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.
Unless the data you are searching is related or depends within each other, I wouldn't go to that route.
I think the following can help do the same declaring rxp once:
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim Http As New XMLHTTP60, rxp As New RegExp
With Http
.Open "GET", Url, False
.send
End With
With rxp
.Pattern = "Company Name:(\s[\w\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[A1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Phone:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[B1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Fax:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[C1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
End With
End Sub

RegEx pattern selects the proper substring but throws error when running macro vba

I am trying to remove everything after the comma , preceded by a [ (an open bracket) and ? (a question mark) in both strings with a regular expression.
I have input like:
Together, Let's End Their War: Promoting a Culture of Health among Veterans on the Gulf - How strongly do you value your relationship with [Field-2]?
and
Together, Let's End Their War: Promoting a Culture of Health among Veterans on the Gulf - During the Clinical Scholars Program, with [Field-2], have you partnered new project(s) other than your team's Wicked Problem Impact Project?
So I want to remove the ? in the first string and the following in the second string
, have you partnered new project(s) other than your team's Wicked Problem Impact Project?
I want to end up with
Together, Let's End Their War: Promoting a Culture of Health among Veterans on the Gulf - How strongly do you value your relationship with [Field-2]
and
Together, Let's End Their War: Promoting a Culture of Health among Veterans on the Gulf - During the Clinical Scholars Program, with [Field-2]
I have
(?<=]),\s*([^,])+|\?
The pattern seems to be capturing what I want
but when I run my macro I get Method 'Replace' of object 'IRegEep2' failed
https://regex101.com/r/c9lDYD/1
I have run many other regex patterns with my macro with no issue so not sure what the problem is.
Sub findReplace()
Dim outArray As Variant
Dim regEx As New RegExp
Dim ws As Worksheet
Dim i As Long
Dim strPattern As String: strPattern = "(?<=]),\s*([^,])+|\?"
Dim strReplace As String: strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Set ws = ThisWorkbook.Sheets("Helper_1Filted")
With ws.Range("K1:K50")
outArray = .value
For i = 1 To UBound(outArray, 1)
outArray(i, 1) = regEx.Replace(outArray(i, 1), strReplace)
Next i
.value = outArray
End With
End Sub
I think the lookbehind is not supported in vba, but if the question mark should come after matching a comma and a part between square brackets you can use a capture group without an alternation |.
When using an alternation | the question mark will be matched anywhere in the string.
You might use a capture group and a negated character class [^
In the replacement use group 1 $1
(,[^\]\[,]*\[[^\]\[]*])[^?]*\?
( Capture group 1
, Match a comma
[^\]\[,]* Match 0+ times any char except a comma or [ or ]
\[[^\]\[]*] Match from [...]
) Close group 1
[^?]* Match 0+ times any char except a question mark
\? Match the question mark
Regex demo
Or a shorter version with an optional capture group:
(\])(?:,\s*[^,]+)?\?
Regex demo
After checking, it maybe the current VBA regex library 5.5 does not support the call back function in your code, as there is warning when i do the testing
?<=]
By amend it abit, it work by replace the question mark in your example, although other sentence I may not 100% sure, I only change the beginning part, fyi.
(^]),\s*([^,])+|\?

Regex to extract inconsistent postal codes from string

Using the solution posted here, I'm looking to extract postal codes from a list of irregular data in Excel.
Below is a sample of what my data looks like:
Brampton L6P 2G9 ON Canada
M5B2R3 Toronto ON
Toronto M5J 0A6 ON Canada
M1H1T7 Canada
Toronto M4P1T8 ON Canada
MISSISUAGABRAMPTON L5M6S6 ON Canada
333 Sea Ray Inisfil l4e2y6 ON Canada
To call the function, I'm using the following formula
=RegexExtract(A1,"^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$")
However the function is not working for me. I think I need to tweak my regex expression in some way but I don't know what I'm missing.
google-spreadsheet
Try,
=REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d")
'alternate
=left(REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d"), 3)&" "&right(REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d"), 3)
You have 2 issues.
First, the expression - if you need to extract the postal code, you can't anchor your regex with ^ and $. The first means "match must occur at the start of the string" and the second means "match must end at the end of the string". This is only useful if you are validating a postal code, but it obviously can't be used to extract one from your examples because they all contain things other than the postal code.
The other problem with the regex is the negative look-ahead assertion (?!.*[DFIOQU]), which means "no match can contain the letters D, F, I, O, Q, or U". To the best of my recollection, this isn't supported in VBScript regex. If I'm mistaken, please correct me in the comments.
That gives you the slightly more pedantic expression:
[ABCEGHJKLMNPRSTVX]\d[ABCEGHJKLMNPRSTVWXYZ][ -]?\d[ABCEGHJKLMNPRSTVWXYZ]\d
I took the liberty of optionally allowing a - between the FSA and LDU because I see that a lot, particularly from non-Canadians.
Second, the function that you're calling (copied below from the linked answer):
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
The first problem is that it is case sensitive. It is also tailored to extracting submatches, which you don't care about - your examples are looking for a single match.
I'd go with this much simpler option that also correctly formats the output:
Public Function ExtractCanadianPostalCode(inputText As String) As String
With CreateObject("vbscript.regexp")
.Pattern = "[ABCEGHJKLMNPRSTVX]\d[ABCEGHJKLMNPRSTVWXYZ][ -]?\d[ABCEGHJKLMNPRSTVWXYZ]\d"
.IgnoreCase = True
If .Test(inputText) Then
Dim matches As Object
Set matches = .Execute(inputText)
ExtractCanadianPostalCode = UCase$(Left$(matches(0), 3) & " " & Right$(matches(0), 3))
End If
End With
End Function

Using Server Side VB to parse textbox contents from date range

I have a date range being inputted into a textbox from a jquery ui daterange selector. I need to get the values on postback of the start date and end date. These values are provided in the textbox, but I'm ignorant on how to seperate out these values on postback with VB server side code. Can anyone show me how I can use vbscript to separate the start and end dates? The textbox results are exactly as follows:
{"start":"2017-04-12","end":"2017-05-17"}
I tried using the following code, but it does not work
Dim strDateStart as String
Dim strDateEnd as String
strDateStart = txtSearchDateRange.Text
strDateStart = Replace(strDateStart, "end*", "")
strDateEnd = txtSearchDateRange.Text
strDateEnd = Replace(strDateEnd, "start*", "")
Thanks to #Mederic, the following code works:
Dim value As String = txtSearchDateRange.Text
Dim strStartDate As String = ""
Dim strEndDate As String = ""
Dim i As Integer = 0
' Call Regex.Matches method.
Dim matches As MatchCollection = Regex.Matches(value, "\d{4}-\d{2}-\d{2}")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
i = i + 1
' Display.
Console.WriteLine("Index={0}, Value={1}", c.Index, c.Value)
If i = 1 Then strStartDate = c.Value
If i = 2 Then strEndDate = c.Value
Next
Next
Response.Write("<BR><BR><BR><BR><BR><BR>Start Date:" & strStartDate & "<BR><BR>End Date:" & strEndDate)
Regex Approach:
A cleaner approach to the Regex using groups
First:
Imports System.Text.RegularExpressions
Then:
'Our regex
Dim regex As Regex = New Regex("(?<start>\d{4}-\d{2}-\d{2}).*(?<end>\d{4}-\d{2}-\d{2})")
'Match from textbox content
Dim match As Match = regex.Match(TextBox1.Text)
'If match is success
If match.Success Then
'Print start group
Console.WriteLine(match.Groups("start").Value)
'Print end group
Console.WriteLine(match.Groups("end").Value)
End If
Explanation of the Regex:
(?<start>REGEX) = Captures a group named start
(?<end>REGEX) = Captures a group named end
\d = Matches a digit
{X} = Matches for X occurences
.* = Makes sure we match zero or one example so not both groups are named start
Example:
\d{4} = Matches 4 digits
Json Approach
Json approach would be possible but a bit more complex I think to implement as you have a illegal name in your Json String: end
But if you wanted to use Json you could import Newtonsoft.Json
And have a class as:
Public Class Rootobject
Public Property start As String
Public Property _end As String
End Class
And then deserialize like this:
Dim obj = JsonConvert.DeserializeObject(Of Rootobject)(TextBox1.Text)
However you would need to implement: DataContract and DataMember
To handle the word end
DataContract MSDN

VBA regex and group

applying the below regex on below email body:
(pls[a-zA-Z0-9 .*-]*) \(([A-Z 0-9]*)\)
email body:
pls18244a.lam64*fra-pth (PI000581)
pls18856a.10ge*fra-pth (PI0005AW)
pls25040a.10ge*fra-pth (IIE0004WK)
pls27477a.10ge*fra-pth (WL050814)
pls22099a.stm4*par-pth (PI0005TE)
returns 5 match, with two groups. what is the VBA script to get groups in each match using using incremental variable to copy each match groups in excel row?
Not making any changes to your regular expression pattern. Using the following way, you can iterate through the groups of each match:
str="pls18244a.lam64*fra-pth (PI000581)pls18856a.10ge*fra-pth (PI0005AW)pls25040a.10ge*fra-pth (IIE0004WK)pls27477a.10ge*fra-pth (WL050814)pls22099a.stm4*par-pth (PI0005TE)"
Set objReg = New RegExp
objReg.IgnoreCase=False
objReg.Global=True
objReg.Pattern = "(pls[a-zA-Z0-9 .*-]*) \(([A-Z 0-9]*)\)"
Set objMatches = objReg.Execute(str)
For Each match In objMatches 'The variable match will contain the full match
a= match.Submatches.Count 'total number of groups in the full match
For i=0 To a-1
MsgBox match.Submatches.Item(i) 'display each group
Next
Next
Set objReg = Nothing