VBA regex and group - regex

applying the below regex on below email body:
(pls[a-zA-Z0-9 .*-]*) \(([A-Z 0-9]*)\)
email body:
pls18244a.lam64*fra-pth (PI000581)
pls18856a.10ge*fra-pth (PI0005AW)
pls25040a.10ge*fra-pth (IIE0004WK)
pls27477a.10ge*fra-pth (WL050814)
pls22099a.stm4*par-pth (PI0005TE)
returns 5 match, with two groups. what is the VBA script to get groups in each match using using incremental variable to copy each match groups in excel row?

Not making any changes to your regular expression pattern. Using the following way, you can iterate through the groups of each match:
str="pls18244a.lam64*fra-pth (PI000581)pls18856a.10ge*fra-pth (PI0005AW)pls25040a.10ge*fra-pth (IIE0004WK)pls27477a.10ge*fra-pth (WL050814)pls22099a.stm4*par-pth (PI0005TE)"
Set objReg = New RegExp
objReg.IgnoreCase=False
objReg.Global=True
objReg.Pattern = "(pls[a-zA-Z0-9 .*-]*) \(([A-Z 0-9]*)\)"
Set objMatches = objReg.Execute(str)
For Each match In objMatches 'The variable match will contain the full match
a= match.Submatches.Count 'total number of groups in the full match
For i=0 To a-1
MsgBox match.Submatches.Item(i) 'display each group
Next
Next
Set objReg = Nothing

Related

how to match multi parts by regular expression in VBA? [duplicate]

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.
My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?
This is the script (working one):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
Reference to add to the library to execute the above script:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the regex demo.
Pattern details:
(Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
:\s* - a colon and then 0+ whitespaces
(\+?[\w\s]*\w) - Capturing group 2:
\+? - an optional +
[\w\s]* - 0 or more letters, digits, _ or whitespaces
\w - a single letter, digit or _.
Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.
Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.
You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.
If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.
Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.
Unless the data you are searching is related or depends within each other, I wouldn't go to that route.
I think the following can help do the same declaring rxp once:
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim Http As New XMLHTTP60, rxp As New RegExp
With Http
.Open "GET", Url, False
.send
End With
With rxp
.Pattern = "Company Name:(\s[\w\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[A1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Phone:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[B1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Fax:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[C1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
End With
End Sub

Regular expression to match page number groups

I need a regular expression to match page numbers as found in common programs.
These usually take the form 1-5,3,5,1-9 for example.
I have a regular expression (\d+-\d+)?,(\d+-\d+?)* which I need help to refine.
As can be seen here regex101 I am matching commas and missing numbers entirely.
What I need is to match 1-5 as group 1, 3 as group 2, 5 as group 3 and 1-9 as group 4 without matching any commas.
Any help is appreciated. I will be using this in VBA.
This worked for me - am I missing something?
Sub Pages()
Dim re As Object, allMatches, m, rv, sep, c As Range, i As Long
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(\d+(-\d+)?)"
re.ignorecase = True
re.MultiLine = True
re.Global = True
For Each c In Range("B5:B20").Cells 'for example
c.Offset(0, 1).Resize(1, 10).ClearContents 'clear output cells
i = 0
If re.test(c.Value) Then
Set allMatches = re.Execute(c.Value)
For Each m In allMatches
i = i + 1
c.Offset(0, i).Value = m
Next m
End If
Next c
End Sub
If I recall correctly, capturing a dynamic number of groups will not work. You can pre-specify the format / number of groups to be matched, or you can catch the repeated groups as one and split them afterwards.
If you know the format, just do
(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)(?:,)(\d+(?:-\d+)?)
which of course is not very neat.
If you want the flexible structure, match the first group and all the rest as a second and then split the latter by the delimiter ',' in whichever language.
(\d+(?:-\d+)?)((?:(?:,)(\d+(?:-\d+)?))*)
You need to make the -\d+ part optional, since you don't always have ranges. And the comma between each range should be part of the second group with the * quantifier, so you can match a single range with no comma after it.
\d+(-\d+)?(,\d+(-\d+)?)*
This will match the string that contains all the ranges. To get an array of individual ranges without the commas, do a second match in this string:
\d+(-\d+)?
Use the VBA function for getting an array of all matches of a regexp (sorry, I don't know VBA, so can't provide the specific syntax).

Extract number and a character from sting using regex

I am trying to extract the number along with 'x'from string:
1. "KAWAN (FRZ) LACHA FLACKEY PARATHA 8X25X80 GM" or
2. G.G. HOT SEV 20X285GM" using function: but it returns only last number with "x". Expected output is 2X25X or 20X... also is it possible to store the string without the extracted value using the same function?:
Public Function getNumber(strInput As String) As Variant
Dim regex As New RegExp
Dim matches As Object
regex.Pattern = "(\d??[x|X])"
regex.Global = False
Set matches = regex.Execute(strInput)
If matches.Count = 0 Then
getNumber = CVErr(xlErrNA)
Else
getNumber = matches(0).Value
End If
End Function
Try the following pattern for your regular expression...
regex.Pattern = "((\d{1,2}[xX])+)"
Results
Demo
By the way, since you're using early binding, you can declare matches as MatchCollection instead of Object.
Dim matches As MatchCollection
Tru changing your regular expression to (\d*)[xX], this will capture none or more numbers followed by an X and put the numbers in a group. You can test this regex in this website, you'll see that applying this regex in your first exemple KAWAN (FRZ) LACHA FLACKEY PARATHA 8X25X80 GM it will capture 8X and 25X and each match will have the 8 and 25 as its group, respectively

How to use multiple patterns within one regex object?

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.
My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?
This is the script (working one):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
Reference to add to the library to execute the above script:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the regex demo.
Pattern details:
(Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
:\s* - a colon and then 0+ whitespaces
(\+?[\w\s]*\w) - Capturing group 2:
\+? - an optional +
[\w\s]* - 0 or more letters, digits, _ or whitespaces
\w - a single letter, digit or _.
Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.
Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.
You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.
If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.
Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.
Unless the data you are searching is related or depends within each other, I wouldn't go to that route.
I think the following can help do the same declaring rxp once:
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim Http As New XMLHTTP60, rxp As New RegExp
With Http
.Open "GET", Url, False
.send
End With
With rxp
.Pattern = "Company Name:(\s[\w\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[A1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Phone:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[B1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Fax:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[C1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
End With
End Sub

Using Server Side VB to parse textbox contents from date range

I have a date range being inputted into a textbox from a jquery ui daterange selector. I need to get the values on postback of the start date and end date. These values are provided in the textbox, but I'm ignorant on how to seperate out these values on postback with VB server side code. Can anyone show me how I can use vbscript to separate the start and end dates? The textbox results are exactly as follows:
{"start":"2017-04-12","end":"2017-05-17"}
I tried using the following code, but it does not work
Dim strDateStart as String
Dim strDateEnd as String
strDateStart = txtSearchDateRange.Text
strDateStart = Replace(strDateStart, "end*", "")
strDateEnd = txtSearchDateRange.Text
strDateEnd = Replace(strDateEnd, "start*", "")
Thanks to #Mederic, the following code works:
Dim value As String = txtSearchDateRange.Text
Dim strStartDate As String = ""
Dim strEndDate As String = ""
Dim i As Integer = 0
' Call Regex.Matches method.
Dim matches As MatchCollection = Regex.Matches(value, "\d{4}-\d{2}-\d{2}")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
i = i + 1
' Display.
Console.WriteLine("Index={0}, Value={1}", c.Index, c.Value)
If i = 1 Then strStartDate = c.Value
If i = 2 Then strEndDate = c.Value
Next
Next
Response.Write("<BR><BR><BR><BR><BR><BR>Start Date:" & strStartDate & "<BR><BR>End Date:" & strEndDate)
Regex Approach:
A cleaner approach to the Regex using groups
First:
Imports System.Text.RegularExpressions
Then:
'Our regex
Dim regex As Regex = New Regex("(?<start>\d{4}-\d{2}-\d{2}).*(?<end>\d{4}-\d{2}-\d{2})")
'Match from textbox content
Dim match As Match = regex.Match(TextBox1.Text)
'If match is success
If match.Success Then
'Print start group
Console.WriteLine(match.Groups("start").Value)
'Print end group
Console.WriteLine(match.Groups("end").Value)
End If
Explanation of the Regex:
(?<start>REGEX) = Captures a group named start
(?<end>REGEX) = Captures a group named end
\d = Matches a digit
{X} = Matches for X occurences
.* = Makes sure we match zero or one example so not both groups are named start
Example:
\d{4} = Matches 4 digits
Json Approach
Json approach would be possible but a bit more complex I think to implement as you have a illegal name in your Json String: end
But if you wanted to use Json you could import Newtonsoft.Json
And have a class as:
Public Class Rootobject
Public Property start As String
Public Property _end As String
End Class
And then deserialize like this:
Dim obj = JsonConvert.DeserializeObject(Of Rootobject)(TextBox1.Text)
However you would need to implement: DataContract and DataMember
To handle the word end
DataContract MSDN