I would like to reply to a webform extracting the email address from the form.
The webform is in a table, thus the ParseTextLinePair() function returns blanks as the email address in the column next to the label.
How can I extract the email address from a webform?
Sub ReplywithTemplatev2()
Dim Item As Outlook.MailItem
Dim oRespond As Outlook.MailItem
'Get Email
Dim intLocAddress As Integer
Dim intLocCRLF As Integer
Dim strAddress As String
Set Item = GetCurrentItem()
If Item.Class = olMail Then
' find the requestor address
strAddress = ParseTextLinePair(Item.Body, "Email-Adresse des Ansprechpartners *")
' This sends a response back using a template
Set oRespond = Application.CreateItemFromTemplate("C:\Users\Reply.oft")
With oRespond
.Recipients.Add Item.SenderEmailAddress
.Subject = "Your Subject Goes Here"
.HTMLBody = oRespond.HTMLBody & vbCrLf & _
"---- original message below ---" & vbCrLf & _
Item.HTMLBody & vbCrLf
' includes the original message as an attachment
' .Attachments.Add Item
oRespond.To = strAddress
' use this for testing, change to .send once you have it working as desired
.Display
End With
End If
Set oRespond = Nothing
End Sub
Function GetCurrentItem() As Object
Dim objApp As Outlook.Application
Set objApp = Application
On Error Resume Next
Select Case TypeName(objApp.ActiveWindow)
Case "Explorer"
Set GetCurrentItem = objApp.ActiveExplorer.Selection.Item(1)
Case "Inspector"
Set GetCurrentItem = objApp.ActiveInspector.CurrentItem
End Select
Set objApp = Nothing
End Function
Function ParseTextLinePair(strSource As String, strLabel As String)
Dim intLocLabel As Integer
Dim intLocCRLF As Integer
Dim intLenLabel As Integer
Dim strText As String
' locate the label in the source text
intLocLabel = InStr(strSource, strLabel)
intLenLabel = Len(strLabel)
If intLocLabel > 0 Then
intLocCRLF = InStr(intLocLabel, strSource, vbCrLf)
If intLocCRLF > 0 Then
intLocLabel = intLocLabel + intLenLabel
strText = Mid(strSource, _
intLocLabel, _
intLocCRLF - intLocLabel)
Else
intLocLabel = Mid(strSource, intLocLabel + intLenLabel)
End If
End If
ParseTextLinePair = Trim(strText)
End Function
A picture of the table to clarify.
Have you looked in to Regular Expressions in VBA, I haven't worked on it in while but here is an example.
Option Explicit
Sub Example()
Dim Item As MailItem
Dim RegExp As Object
Dim Search_Email As String
Dim Pattern As String
Dim Matches As Variant
Set RegExp = CreateObject("VbScript.RegExp")
Pattern = "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b"
For Each Item In ActiveExplorer.Selection
Search_Email = Item.body
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Search_Email)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0)
Else
Debug.Print "Not Found "
End If
Next
Set RegExp = Nothing
End Sub
Or Pattern = "(\S*#\w+\.\w+)" Or "(\w+(?:\W+\w+)*#\w+\.\w+)"
Regular-expressions.info/tutorial
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b Simple pattern that describes an email address.
A series of letters, digits, dots, underscores, percentage signs and hyphens, followed by an at sign, followed by another series of letters, digits and hyphens, finally followed by a single dot and two or more letters
[A-Z0-9._%+-]+ Match a single character present in the list below
A-Z A single character in the range between A and Z (case sensitive)
0-9 A single character in the range between 0 and 9
._%+- A single character in the list
# Matches the character # literally
Quantifiers
Udemy.com/vba-regex/
+---------+---------------------------------------------+------------------------------------------------------------+
| Pattern | Meaning | Example |
+---------+---------------------------------------------+------------------------------------------------------------+
| | | |
| – | Stands for a range | a-z means all the letters a to z |
| [] | Stands for any one of the characters quoted | [abc] means either a, b or c.[A-Z] means either A, B, …, Z |
| () | Used for grouping purposes | |
| | | Meaning is ‘or’ | X|Y, means X or Y |
| + | Matches the character one or more times | zo+ matches ‘zoo’, but not ‘z’ |
| * | Matches the character zero or more times | “lo*” matches either “l” or “loo” |
| ? | Matches the character zero or once | “b?ve?” matches the “ve” in “never”. |
+---------+---------------------------------------------+------------------------------------------------------------+
Wikibooks.org/wiki/Visual_Basic/Regular_Expressions
https://regex101.com/r/oP2yR0/1
Related
In sentences like:
"[x] Alpha
[33] Beta"
I extract an array of bracketed data as ([x], [33])
using VBA regex Pattern:
"(\[x\])|(\[\d*\])"
I cannot extract directly the array of un-bracketed data as (x, 33)
using web resources advice for pattern
"(?<=\[)(.*?)(?=\])"
Is this a VBA specific problem (i.e. limits on its implementation of Regex)
or did I misunderstand 'looking forward and backward' patterns?
Public Function Regx( _
ByVal SourceString As String, _
ByVal Pattern As String, _
Optional ByVal IgnoreCase As Boolean = True, _
Optional ByVal MultiLine As Boolean = True, _
Optional ByVal MatchGlobal As Boolean = True) _
As Variant
Dim oMatch As Match
Dim arrMatches
Dim lngCount As Long
' Initialize to an empty array
arrMatches = Array()
With New RegExp
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Global = MatchGlobal
.Pattern = Pattern
For Each oMatch In .Execute(SourceString)
ReDim Preserve arrMatches(lngCount)
arrMatches(lngCount) = oMatch.Value
lngCount = lngCount + 1
Next
End With
Sub testabove()
Call Regx("[x] Alpha" & Chr(13) & _
"[33] Beta", "(\[x\])|(\[\d*\])")
End Sub
Use capturing around the subpatterns that will fetch you your required value.
Use
"\[(x)\]|\[(\d*)\]"
(or \d+ if you need to match at least 1 digit, as * means zero or more occurrences, and + means one or more occurrences).
Or, use the generic pattern to extract anything inside the square brackets without the brackets:
"\[([^\][]+)]"
Then, access the right Submatches index by checking the submatch length (since you have an alternation, either of the submatch will be empty), and there you go. Just change your for loop with
For Each oMatch In .Execute(SourceString)
ReDim Preserve arrMatches(lngCount)
If Len(oMatch.SubMatches(0)) > 0 Then
arrMatches(lngCount) = oMatch.SubMatches(0)
Else
arrMatches(lngCount) = oMatch.SubMatches(1)
End If
' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
lngCount = lngCount + 1
Next
With Excel and VBA you can strip the brackets after the regex extraction:
Sub qwerty()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = "38c6v5hrk[x]537fhvvb"
Set regex = New RegExp
regex.Pattern = "(\[x\])|(\[\d*\])"
Set MColl = regex.Execute(inpt)
temp2 = MColl(0).Value
L = Len(temp2) - 2
outpt = Mid(temp2, 2, L)
MsgBox inpt & vbCrLf & outpt
End Sub
Try this:
\[(x)\]|\[(\d*)\]
What you don't want to be captured, don't put them inside (). this is used for grouping
Explanation
You will get x and 33 in $1 and $2
Dot Net Sample
Alright, I prepared it for you , although far away from vb for long. Lots of it might be not needed, yet it might help you to understand it better
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim text As String = "[x] Alpha [33] Beta]"
Dim pattern As String = "\[(x)\]|\[(\d*)\]"
' Instantiate the regular expression object.
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
' Match the regular expression pattern against a text string.
Dim m As Match = r.Match(text)
Dim matchcount as Integer = 0
Do While m.Success
matchCount += 1
Console.WriteLine("Match" & (matchCount))
Dim i As Integer
For i = 1 to 2
Dim g as Group = m.Groups(i)
Console.WriteLine("Group" & i & "='" & g.ToString() & "'")
Dim cc As CaptureCollection = g.Captures
Dim j As Integer
For j = 0 to cc.Count - 1
Dim c As Capture = cc(j)
Console.WriteLine("Capture" & j & "='" & c.ToString() _
& "', Position=" & c.Index)
Next
Next
m = m.NextMatch()
Loop
End Sub
End Module
Array Without Regex:
For Each Value In Split(SourceString, Chr(13))
ReDim Preserve arrMatches(lngCount)
arrMatches(lngCount) = Split(Split(Value, "]")(0), "[")(1)
lngCount = lngCount + 1
Next
Ok, to start. I'm a little rusty on VBA, 3 + years since Ive need to use it.
In short, im struggling to extract text from a string. Im using regular expression to extract my department name and date from this string.
The Department will always fall between : and -.
I can't share the document due to security. But, I can explain the format and hopefully we can work from that.
Col A----Col B----Col C---Col D
Date(e)--Dept(e)--String--Duration
Where (e) means it was extracted from the string.
My code for the extraction, thus far, is below. Currently it will loop through all available rows and extract the department, but it always take the : and - with it! I can't seem to find a way to cut these out.
Any assistance?
I can probably work out the date bit eventually.
The final output from this code is ": Inbound Contacts -"
Where I need, "Inbound Contacts".
Sub stringSearch()
Dim ws As Worksheet
Dim lastRow As Long, x As Long
Dim matches As Variant, match As Variant
Dim Reg_Exp As Object
Set Reg_Exp = CreateObject("vbscript.regexp")
Reg_Exp.Pattern = "\:\s(\w.+)\s\-"
Set ws = Sheet2
lastRow = ws.Range("C" & Rows.Count).End(xlUp).Row
For x = 1 To lastRow
Set matches = Reg_Exp.Execute(CStr(ws.Range("C" & x).Value))
If matches.Count > 0 Then
For Each match In matches
ws.Range("B" & x).Value = match.Value
Next match
End If
Next x
End Sub
This is how to achieve what you want without regex, in general it should be a bit faster and way more understandable:
Sub TestMe()
Dim inputString As String
inputString = "Planning Unit: Inbound Contacts = Tuesday, 27/03/2018"
Debug.Print Split(Split(inputString, ":")(1), "=")(0)
End Sub
split the inputString by : and take the second part;
split the taken part by = and take the first part;
You are not accessing Group 1 value.
Instead of ws.Range("B" & x).Value = match.Value use
ws.Range("B" & x).Value = match.Submatches(0)
You may also enhance the regex a bit to
Reg_Exp.Pattern = ":\s*(\w.*?)\s*-"
This way, you will "trim" the Group 1 value. See the regex demo.
Details
: - a : char
\s* - 0+ whitespace chars
(\w.*?) - Group 1 (.Submatches(0)): a word char followed with any 0+ chars (other than line break chars) as few as possible (NOTE that \w does not match non-ASCII letters, probably you want to match any char that is not whitespace and not a -, then use [^\s-] instead of \w)
\s* - 0+ whitespace chars
- - a hyphen.
Regex:
You can use this Regex: ([\s\S]+?):\s*([\s\S]+?)\s*-\s*([A-z]+)\s*,\s*([0-9]{2}\/[0-9]{2}\/[0-9]{4})\b
And the demo
Code:
And this code:
Sub stringSearch()
Dim ws As Worksheet
Dim lastRow As Long, x As Long
Dim matches As Variant, match As Variant
Dim Reg_Exp As Object
Set Reg_Exp = CreateObject("vbscript.regexp")
Reg_Exp.Pattern = "([\s\S]+?):\s*([\s\S]+?)\s*-\s*([A-z]+)\s*,\s*([0-9]{2}\/[0-9]{2}\/[0-9]{4})\b"
Set ws = Sheet2
lastRow = ws.Range("C" & Rows.Count).End(xlUp).Row
For x = 1 To lastRow
Set matches = Reg_Exp.Execute(CStr(ws.Range("C" & x).Value))
If matches.Count > 0 Then
For Each match In matches
For i = 0 To match.SubMatches.Count - 1
Debug.Print match.SubMatches(i)
Next i
Next match
End If
Next x
End Sub
Result
This is the result on the immediate window:
+-------------------+
| Planning Unit |
| Inbound Contracts |
| Tuesday |
| 27/03/2018 |
| Planning Unit |
| Payments & Orders |
| Tuesday |
| 27/03/2018 |
| Planning Unit |
| Scheduling |
| Tuesday |
| 27/03/2018 |
+-------------------+
I'd use Left/Right/Mid and InStr/InStrRev instead of RegEx in this case.
For extracting the department:
Dim mainStr As String
Dim deptStr As String
mainStr = "Planning Unit: Inbound Contacts - Tuesday, 27/03/2018"
deptStr = Mid(mainStr, InStr(mainStr, ":") + 2)
deptStr = Left(deptStr, InStr(deptStr, "-") - 2)
For extracting the date:
Dim mainStr As String
Dim dateStr As String
mainStr = "Planning Unit: Inbound Contacts - Tuesday, 27/03/2018"
dateStr = Right(mainStr, Len(mainStr) - InStrRev(mainStr, " "))
To be honest, this kind of situation is common enough that you might want to write some sort of "extractText" function to get the text between delimiters. Here's the one I use.
Function extractText(str As String, leftDelim As String, rightDelim As String, _
Optional reverseSearch As Boolean = False) As String
'Extracts text between two delimiters in a string
'By default, searches for first instance of each delimiter in string from left to right
'To search from right to left, set reverseSearch = True
'If left delimiter = "", function returns text up to right delimiter
'If right delimiter = "", function returns text after left delimiter
'If left or right delimiter not found in string, function returns empty string
Dim leftPos As Long
Dim rightPos As Long
Dim leftLen As Long
If reverseSearch Then
leftPos = InStrRev(str, leftDelim)
rightPos = InStrRev(str, rightDelim)
Else
leftPos = InStr(str, leftDelim)
rightPos = InStr(str, rightDelim)
End If
leftPos = IIf(leftDelim = "", -1, leftPos)
rightPos = IIf(rightDelim = "", -1, rightPos)
leftLen = Len(leftDelim)
If leftPos > 0 Then
If rightPos = -1 Then
extractText = Mid(str, leftPos + leftLen)
ElseIf rightPos > leftPos Then
extractText = Mid(str, leftPos + leftLen, rightPos - leftPos - leftLen)
End If
ElseIf leftPos = -1 Then
If rightPos > 0 Then
extractText = Left(str, rightPos - 1)
End If
End If
End Function
I need your help! I’d like to use RegEx in a Excel/VBA environment. I do have an approach, but I’m kind of reaching my limits...
I need to match 5 characters within a great many lines of string (the string being in column B of my excel sheet, A comes later). The 5 characters can be 5 digits or a „K“ followed by 4 digits (ex. 12345, 98765, K2345). This would be covered by (\d{5}|K\d{4}).
Them five can be preceeded or followed by letters or special characters, but not by numbers. Meaning no leading zeros are allowed and also the digits shouldn’t just be matched within a longer number. That's one point where I'm stuck.
If there’s more than one possible match in a string, I need them all to be matched. If the same number has been matched within a line already, I’d like it not to be matched again. For these two requirements, I do have a sort of solution already, that works as part of the VBA code at the end of this posting: (\d{5}|K\d{4})(?!.*?\1.*$)
In addition, I do have a specific single digit (or a „K“) in column A. I need the five characters to start with this specific character, or otherwise not be matched.
Example of strings (numbered). The two columns A and B are separated by "|" for better readability
(1) | 1 | 2018/ID11298 00000012345 PersoNR: 889899 Bridgestone BNPN
(2) | 3 | Kompo 32280EP ###Baukasten### 3789936690 ID PFK Carbon0
(3) | 2 | 20613, 20614, Mietop Antragsnummer C300Coup IVS 33221 ABF
(4) | 2 | Q21009 China lokal produzierte Derivate f/Radverbund 991222 VV
(5) | 6 | ID:61953 F-Pace Enfantillages (Machine arriere) VvSKPMG Lyon09
(6) | 2 | 2017/22222 22222 21895 Einzelkostenprob. 28932 ZürichMP KOS
(7) | K | ID:K1245 Panamera Nitsche Radlager Derivativ Bayreumion PwC
(8) | 7 | LaunchSupport QBremsen BBG BFG BBD 70142,70119 KK 70142
The results that I'm looking for here are:
(1) | 11298 | ............................. [but don't match 12345, since no preceeding numbers allowed]
(2) | 32280 | ............................. [but don't match 37899 within 3789936690]
(3) | 20613 | 20614 | ................ [match both starting with a 2, don't match the one starting with 3]
(4) | 21009 | ............................. [preceeded by a letter, which is perfectly fine
(5) | 61953 | ..............................[random example]
(6) | 22222 | 21895 | 28932 | ... [match them all, but no duplicates]
(7) | K1245 | ............................. [special case with a "K"]
(8) | 70142 | 70119 | ................ [ignore second 70142]
The RegEx/VBA Code that I've put together so far is:
Sub RegEx()
Dim varOut() As Variant
Dim objRegEx As Object
Dim lngColumn As Long
Dim objRegA As Object
Dim varArr As Variant
Dim lngUArr As Long
Dim lngTMP As Long
On Error GoTo Fin
With Worksheets("Sheet1")
varArr = .Range("B2:B50")
Set objRegEx = CreateObject("VBScript.Regexp")
With objRegEx
.Pattern = "(\d{5}|K\d{4})(?!.*?\1.*$)" 'this is where the magic happens
.Global = True
For lngUArr = 1 To UBound(varArr)
Set objRegA = .Execute(varArr(lngUArr, 1))
If objRegA.Count >= lngColumn Then
lngColumn = objRegA.Count
End If
Set objRegA = Nothing
Next lngUArr
If lngColumn = 0 Then Exit Sub
ReDim varOut(1 To UBound(varArr), 1 To lngColumn)
For lngUArr = 1 To UBound(varArr)
Set objRegA = .Execute(varArr(lngUArr, 1))
For lngTMP = 1 To objRegA.Count
varOut(lngUArr, lngTMP) = objRegA(lngTMP - 1)
Next lngTMP
Set objRegA = Nothing
Next lngUArr
End With
.Cells(2, 3).Resize(UBound(varOut), UBound(varOut, 2)) = varOut
End With
Fin:
Set objRegA = Nothing
Set objRegEx = Nothing
If Err.Number <> 0 Then MsgBox "Error: " & Err.Number & " " & Err.Description
End Sub
This code is checking the string from column B and delivering its matches in columns C, D, E etc. It's not matching duplicates. It is however matching numbers within larger numbers, which is a problem. \b for example doesn't work for me, because I still want to match 12345 in EP12345.
Also, I have no idea how to implement the character from column A to be the very first character.
I've uploaded my excel file here: mollmell.de/RegEx.xlsm
Thank you so much for suggestions
Stephan
To sort out the numbers which are too long, you can use a negative lookbehind and lookahead that doesn't match preceding and successing digits:
(?x) (?<!\d) (\d{5} | K\d{4}) (?!\d)
https://regex101.com/r/RBnoMo/1
To match only numbers with the key in column 2 is rather hard. Maybe you match either the key or the numbers and do the logic afterwards:
(?x)
\|[ ](?<key>.)[ ]\| |
(?<!\d) (?<number>\d{5} | K\d{4}) (?!\d)
https://regex101.com/r/60d0yT/2
In sentences like:
"[x] Alpha
[33] Beta"
I extract an array of bracketed data as ([x], [33])
using VBA regex Pattern:
"(\[x\])|(\[\d*\])"
I cannot extract directly the array of un-bracketed data as (x, 33)
using web resources advice for pattern
"(?<=\[)(.*?)(?=\])"
Is this a VBA specific problem (i.e. limits on its implementation of Regex)
or did I misunderstand 'looking forward and backward' patterns?
Public Function Regx( _
ByVal SourceString As String, _
ByVal Pattern As String, _
Optional ByVal IgnoreCase As Boolean = True, _
Optional ByVal MultiLine As Boolean = True, _
Optional ByVal MatchGlobal As Boolean = True) _
As Variant
Dim oMatch As Match
Dim arrMatches
Dim lngCount As Long
' Initialize to an empty array
arrMatches = Array()
With New RegExp
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Global = MatchGlobal
.Pattern = Pattern
For Each oMatch In .Execute(SourceString)
ReDim Preserve arrMatches(lngCount)
arrMatches(lngCount) = oMatch.Value
lngCount = lngCount + 1
Next
End With
Sub testabove()
Call Regx("[x] Alpha" & Chr(13) & _
"[33] Beta", "(\[x\])|(\[\d*\])")
End Sub
Use capturing around the subpatterns that will fetch you your required value.
Use
"\[(x)\]|\[(\d*)\]"
(or \d+ if you need to match at least 1 digit, as * means zero or more occurrences, and + means one or more occurrences).
Or, use the generic pattern to extract anything inside the square brackets without the brackets:
"\[([^\][]+)]"
Then, access the right Submatches index by checking the submatch length (since you have an alternation, either of the submatch will be empty), and there you go. Just change your for loop with
For Each oMatch In .Execute(SourceString)
ReDim Preserve arrMatches(lngCount)
If Len(oMatch.SubMatches(0)) > 0 Then
arrMatches(lngCount) = oMatch.SubMatches(0)
Else
arrMatches(lngCount) = oMatch.SubMatches(1)
End If
' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
lngCount = lngCount + 1
Next
With Excel and VBA you can strip the brackets after the regex extraction:
Sub qwerty()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = "38c6v5hrk[x]537fhvvb"
Set regex = New RegExp
regex.Pattern = "(\[x\])|(\[\d*\])"
Set MColl = regex.Execute(inpt)
temp2 = MColl(0).Value
L = Len(temp2) - 2
outpt = Mid(temp2, 2, L)
MsgBox inpt & vbCrLf & outpt
End Sub
Try this:
\[(x)\]|\[(\d*)\]
What you don't want to be captured, don't put them inside (). this is used for grouping
Explanation
You will get x and 33 in $1 and $2
Dot Net Sample
Alright, I prepared it for you , although far away from vb for long. Lots of it might be not needed, yet it might help you to understand it better
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim text As String = "[x] Alpha [33] Beta]"
Dim pattern As String = "\[(x)\]|\[(\d*)\]"
' Instantiate the regular expression object.
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
' Match the regular expression pattern against a text string.
Dim m As Match = r.Match(text)
Dim matchcount as Integer = 0
Do While m.Success
matchCount += 1
Console.WriteLine("Match" & (matchCount))
Dim i As Integer
For i = 1 to 2
Dim g as Group = m.Groups(i)
Console.WriteLine("Group" & i & "='" & g.ToString() & "'")
Dim cc As CaptureCollection = g.Captures
Dim j As Integer
For j = 0 to cc.Count - 1
Dim c As Capture = cc(j)
Console.WriteLine("Capture" & j & "='" & c.ToString() _
& "', Position=" & c.Index)
Next
Next
m = m.NextMatch()
Loop
End Sub
End Module
Array Without Regex:
For Each Value In Split(SourceString, Chr(13))
ReDim Preserve arrMatches(lngCount)
arrMatches(lngCount) = Split(Split(Value, "]")(0), "[")(1)
lngCount = lngCount + 1
Next
I have a function pulled from here. My problem is that I don't know what RegEx pattern I need to use to split out the following data:
+1 vorpal unholy longsword +31/+26/+21/+16 (2d6+13)
+1 vorpal flaming whip +30/+25/+20 (1d4+7 plus 1d6 fire and entangle)
2 slams +31 (1d10+12)
I want it to look like:
+1 vorpal unholy longsword, 31
+1 vorpal flaming whip, 30
2 slams, 31
Here is the VBA code that does the RegExp validation:
Public Function RXGET(ByRef find_pattern As Variant, _
ByRef within_text As Variant, _
Optional ByVal submatch As Long = 0, _
Optional ByVal start_num As Long = 0, _
Optional ByVal case_sensitive As Boolean = True) As Variant
' RXGET - Looks for a match for regular expression pattern find_pattern
' in the string within_text and returns it if found, error otherwise.
' Optional long submatch may be used to return the corresponding submatch
' if specified - otherwise the entire match is returned.
' Optional long start_num specifies the number of the character to start
' searching for in within_text. Default=0.
' Optional boolean case_sensitive makes the regex pattern case sensitive
' if true, insensitive otherwise. Default=true.
Dim objRegex As VBScript_RegExp_55.RegExp
Dim colMatch As VBScript_RegExp_55.MatchCollection
Dim vbsMatch As VBScript_RegExp_55.Match
Dim colSubMatch As VBScript_RegExp_55.SubMatches
Dim sMatchString As String
Set objRegex = New VBScript_RegExp_55.RegExp
' Initialise Regex object
With objRegex
.Global = False
' Default is case sensitive
If case_sensitive Then
.IgnoreCase = False
Else: .IgnoreCase = True
End If
.pattern = find_pattern
End With
' Return out of bounds error
If start_num >= Len(within_text) Then
RXGET = CVErr(xlErrNum)
Exit Function
End If
sMatchString = Right$(within_text, Len(within_text) - start_num)
' Create Match collection
Set colMatch = objRegex.Execute(sMatchString)
If colMatch.Count = 0 Then ' No match
RXGET = CVErr(xlErrNA)
Else
Set vbsMatch = colMatch(0)
If submatch = 0 Then ' Return match value
RXGET = vbsMatch.Value
Else
Set colSubMatch = vbsMatch.SubMatches ' Use the submatch collection
If colSubMatch.Count < submatch Then
RXGET = CVErr(xlErrNum)
Else
RXGET = CStr(colSubMatch(submatch - 1))
End If
End If
End If
End Function
I don't know about Excel but this should get you started on the RegEx:
/(?:^|, |and |or )(\+?\d?\s?[^\+]*?) (?:\+|-)(\d+)/
NOTE: There is a slight caveat here. This will also match if an element begins with + only (not being followed by a digit).
Capture groups 1 and 2 contain the strings that go left and right of your comma (if the whole pattern has index 0). So you can something like capture[1] + ', ' + capture[2] (whatever your syntax for that is).
Here is an explanation of the regex:
/(?:^|, |and |or ) # make sure that we only start looking after
# the beginning of the string, after a comma, after an
# and or after an or; the "?:" makes sure that this
# subpattern is not capturing
(\+? # a literal "+"
\d+ # at least one digit
# a literal space
[^+]*?) # arbitrarily many non-plus characters; the ? makes it
# non-greedy, otherwise it might span multiple lines
# a literal space
\+ # a literal "+"
(\d+)/ # at least one digit (and the brakets are for capturing)