How to save SubMatches as array and print not empty submatches? - regex

When I try the following Regex code and add a "Add Watch" (Shift + F9) to Matches
Sub TestRegEx1()
Dim regex As Object, Matches As Object
Dim str As String
str = "This is text for the submatches"
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "Th(is).+(for).+(submatches)|.+(\d)|([A-Z]{3})"
regex.IgnoreCase = True
Set Matches = regex.Execute(str)
End Sub
I see that Matches is structured like this (with 2 empty submatches):
2 questions:
How can I save in an array variable the SubMatches?
How can I Debug.Print only elements that are not empty?
I've tried doing like below but is not working
Set Arr = Matches.SubMatches
Set Arr = Matches(1).SubMatches
Set Arr = Matches.Item(1).SubMatches
Thanks in advance

Is the following what you intended? Oversize an array at the start and redim at the end. First version prints only non-empty but stores all. Second version prints and stores only non-empty.
You probably want to .Test to ensure there are matches.
Option Explicit
Sub TestRegEx1()
Dim regex As Object, matches As Object, match As Object, subMatch As Variant
Dim str As String, subMatches(), i As Long
ReDim subMatches(0 To 1000)
str = "This is text for the submatches"
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "Th(is).+(for).+(submatches)|.+(\d)|([A-Z]{3})"
regex.IgnoreCase = True
Set matches = regex.Execute(str)
For Each match In matches
For Each subMatch In match.subMatches
subMatches(i) = match.subMatches(i)
If Not IsEmpty(subMatches(i)) Then Debug.Print subMatches(i)
i = i + 1
Next
Next
ReDim Preserve subMatches(0 To i)
End Sub
If you only want to store non-empty then
Option Explicit
Sub TestRegEx1()
Dim regex As Object, matches As Object, match As Object, subMatch As Variant
Dim str As String, subMatches(), i As Long
ReDim subMatches(0 To 1000)
str = "This is text for the submatches"
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "Th(is).+(for).+(submatches)|.+(\d)|([A-Z]{3})"
regex.IgnoreCase = True
Set matches = regex.Execute(str)
For Each match In matches
For Each subMatch In match.subMatches
subMatches(i) = match.subMatches(i)
If Not IsEmpty(subMatches(i)) Then
Debug.Print subMatches(i)
i = i + 1
End If
Next
Next
ReDim Preserve subMatches(0 To i)
End Sub

You may use a Collection and fill it on the go.
Add
Dim m, coll As Collection
Initialize the collection:
Set coll = New Collection
Then, once you get the matches, use
If Matches.Count > 0 Then ' if there are matches
For Each m In Matches(0).SubMatches ' you need the first match submatches
If Len(m) > 0 Then coll.Add (m) ' if not 0 length, save value to collection
Next
End If
Result of the code with changes:

Related

How to replace Numbers in Parentheses with some calculations in MS Word

I have a problem to replace some serial number such as [30] [31] [32]... to [31] [32] [33]... in MS word when I insert a new references in the middle of article. I have not found a solution way in GUI so I try to use VBA to do that replacement. I find a similar problem in stack overflow:
MS Word Macro to increment all numbers in word document
However, this way is a bit inconvenient because it have to generate some replacement array in other place. Can I make that replacement with regex and some function in MS Word VBA like code below?
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
End With
'How to do some calculations with $1?
Selection.Text = regExp.Replace(Selection.Text, "[$1]")
End Sub
But I don't know how to do some calculations with $1 in regExp? I have try use "[$1+1]" but it return [31+1] [32+1] [33+1]. Can anyone help? Thanks!
It is impossible to pass a callback function to the RegExp.Replace, so you have the only option: use RegExp.execute and process matches in a loop.
Here is an example code for your case (I took a shortcut since you only have the value to modify inside known delimiters, [ and ].)
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})]"
.Global = True
End With
'How to do some calculations with $1?
' Removing regExp.Replace(Selection.Text, "[$1]")
For Each m In regExp.Execute(Selection.Text)
Selection.Text = Left(Selection.Text, m.FirstIndex+1) _
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) _
& Mid(Selection.Text, m.FirstIndex + Len(m.Value))
Next m
End Sub
Here,
Selection.Text = Left(Selection.Text, m.FirstIndex+1) - Get what is before
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) - Add 10 to the captured number
& Mid(Selection.Text, m.FirstIndex + Len(m.Value)) - Append what is after the capture
That should do the trick :
Sub IncrementWithRegex()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim regMatch As Variant
Dim ACrO As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[\[]([0-9]{2})[\]]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
For Each regMatch In regEx.Execute(oRange.Text)
oRange.Text = _
Left(oRange.Text, _
InStr(1, oRange.Text, CStr(regMatch))) & _
CDbl(regMatch) + 1 & _
Right(oRange.Text, _
Len(CStr(regMatch)) + InStr(1, oRange.Text, CStr(regMatch)))
Next regMatch
Else
End If
Set Para = ParaNext
Loop
End Sub
To use this, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Here is a simple VBA macro you can use to achieve this :
Sub IncrementNumbers()
Dim regExp As Object
Dim i As Integer
Dim fullMatch As String
Dim subMatch As Integer
Dim replacement As String
Const TMP_PREFIX As String = "$$$"
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
.MultiLine = True
End With
'Ensure selected text match our regex
If regExp.test(Selection.Text) Then
'Find all matches
Set matches = regExp.Execute(Selection.Text)
' Start from last match
For i = 0 To (matches.Count - 1)
fullMatch = matches(i).Value
subMatch = CInt(matches(i).SubMatches(0))
'Increment by 1
subMatch = subMatch + 1
'Create replacement. Add a temporary prefix so we ensure [30] replaced with [31]
'will not be replaced with [32] when [31] will be replaced
replacement = "[" & TMP_PREFIX & subMatch & "]"
'Replace full match with [subMatch]
Selection.Text = Replace(Selection.Text, fullMatch, replacement)
Next
End If
'Now replacements are complete, we have to remove replacement prefix
Selection.Text = Replace(Selection.Text, TMP_PREFIX, "")
End Sub

Regex extract data before certain text

I have large text documents that has some data I want to be extracted.
As you can see in a screenshot , I want to extract A040 to excel column next to the filename.
Before the A040 there is always three empty spaces and than text Sheet (also in screenshot)
Every file has different number and there is always letter A with three digits and text Sheet. --> example file uploaded:
I has something already in VB with Excel but it is not working.
Dim cell As Range
Dim rng As Range
Dim output As String
Set rng = ws.Range("A1", ws.Range("A1").SpecialCells(xlLastCell).Address)
For Each cell In rng
On Error Resume Next
output = ExtA(cell.Value)
If Len(output) > 0 Then
Range("B" & j) = output
Exit For
End If
Next
j = j + 1
ws.Cells.ClearContents
'Call DelConns
strFileName = Dir 'next file
Loop
End Sub
Function ExtA(ByVal text As String) As String
'REGEX Match VBA in excel
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(?<=Sheet)[^Sheet]*\ Sheet"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
If allMatches.Count <> 0 Then
result = allMatches.Item(0).submatches.Item(0)
End If
ExtA = result
End Function
This seems to work on your sample.
Option Explicit
Function AthreeDigits(str As String)
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
AthreeDigits = vbNullString
With rgx
.Global = False
.MultiLine = True
.Pattern = "\A[0-9]{3}[\s]{3}Sheet"
If .Test(str) Then
Set cmat = .Execute(str)
AthreeDigits = Left(cmat.Item(0), 4)
End If
End With
End Function
Did you mean to say that there are 4 spaces after the A040 and before the "Sheet"? If so, try this pattern:
.pattern = "(A\d\d\d)\s{3}Sheet"
EDIT: I thought you said 4 spaces, but you said 3. My pattern now reflects that.
EDIT 2: (I need more coffee!) Change the \b to \s.
See Example here
"\s+[Aa]\d*\s+Sheet"
Or
\s+[Aa]\d*\s+(Sheet)
Or
[Aa]\d*\s+(Sheet)
Demo
https://regex101.com/r/Qo8iUf/3
\s+ Matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible
Aa Matches a single character in the list Aa (case sensitive)
\d* Matches a digit (equal to [0-9])
* Quantifier — Matches between zero and unlimited times, as many times as possible

use regex in a wrapped text value cell

I have a cell like this :
1 parent
1 child
I am getting the value in vba :
Dim nbChild As String
Dim nbParent As String
Sheets("Feuil1").Cells(C.Row - 1, C.Column).Value
However, I would like to put the number of parent and child in 2 separates variables nbParent and nbChild, so I was thinking to use regex to capture the groups (digit number before parent and digit number before child).
But I don't know how to do it. Thanks in advance for your help
Dim arr, parent, child
arr = Split(ActiveCell.Value, Chr(10))'split on hard return
parent=arr(0)
child=arr(1)
'then split each line on space....
debug.print Split(parent," ")(0) 'number
debug.print Split(parent," ")(1) 'text
debug.print Split(child," ")(0) 'number
debug.print Split(child," ")(1) 'text
With data like:
click on a cell and run:
Sub Family()
ary = Split(ActiveCell.Value, " ")
nParent = CLng(ary(0))
nChild = CLng(ary(2))
MsgBox nParent & vbCrLf & nChild
End Sub
(regex is not necessary)
I agree with #Gary's Student that you don't seem to need Regex but if you insist on it, I think the code below works. You'll need to add in reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Sub main()
Dim value As String
Dim re As VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
Dim match As VBScript_RegExp_55.match
value = Range("A1").value
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "\d+"
re.Global = True
Set matches = re.Execute(value)
For Each match In matches
Debug.Print match.value
Next
End Sub

send regexp matches to an array of strings

I'm trying to get the code below to send the results of the regexp search to an array of strings. How can I do that?
When I change name to an array of strings i.e. Dim name() as String VBA throws a type-mismatch exception. Any idea what I can do to fix that?
Many thanks.
Do While Not EOF(1)
Line Input #1, sText
If sText <> "" Then
Dim Regex As Object, myMatches As Object
' instantiates regexp object
Set Regex = CreateObject("VBScript.RegExp")
With Regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern = "^Personal\sname\s*[:]\s*"
End With
' get name, seperated from Personal Name
If Regex.test(sText) Then
Set myMatches = Regex.Execute(sText)
Dim temp As String
temp = Regex.Replace(sText, vbNullString)
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
Dim temp2 As String
temp2 = myMatches.Item(0)
name = Trim(Left(temp2, Len(temp2) - 3))
End If
End If
Loop
You should not use "name" as a variable name as it conflicts with an excel property. Try sName or sNames instead, where s is for string.
With a array you need to give it a size before you can assign a value to each element.
Dim sNames(4) As String '// Or Dim sNames(1 To 4) As String
sName(1) = "John"
...
sName(4) = "Sam"
or if you don't know the total number of elements (names) to begin with then:
Dim sNames() As String
Dim iTotalNames As Integer
iTotalNames = '// Some code here to determine how many names you will have
ReDim sNames(iTotalNames) '// You can also use ReDim Preserve if you have existing elements
sName(1) = "John"
...
sName(4) = "Sam"
So I suspect you will need something like:
Dim sNames() As String
Dim iTotalNames As Integer
'// Your code ....
iTotalNames = iTotalNames + 1
ReDim Preserve sNames(iTotalNames)
sNames(iTotalNames) = Trim(Left(temp2, Len(temp2) - 3))
'// Rest of your code ...
Also in VBA all dimensioning of variables should be at the top of the module.
change
'call this "A"
Dim temp2 As String
temp2 = myMatches.Item(0)
to
'stick this at the top
redim temp2(0 to 0)
'replace "A" with this
new_top = ubound(temp2)+1
redim preserve temp2 (0 to new_top)
temp2(new_top) = myMatches.Item(0)

Using regexp in Excel can I perform some arithmetic on the matched pattern before replacing the matched string?

I am using `VBscript.RegExp`` to find and replace using a regular expression. I'm trying to do something like this:
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
I.e. I have cells which contain things like ID_006 and I want to replace the contents of such a cell with a hyperlink to cell A8. So I match the three digits, and then want to add 2 to those digits to get the correct row to hyperlink to.
But the CStr(CInt("$1") + 2) part doesn't work. Any suggestions on how I can make it work?
Ive posted given these points
you should test for a valid match before trying a replace
from your current code the Global is redundant as you can add 1 hyerplink (1 match) to a cell
your current code will accept a partial string match, if you wanted to avoid ID_9999 then you match the entire string using ^ and $. This version runs me, you can revert to your current pattern with .Pattern = "ID_(\d{3})"
Normally when adding a hyperlink a visible address is needed. The code beloe does this (with the row manipulation in one shot)
The code below runs at A1:A10 (sample shown dumping to B1:B10 for pre and post coede)
Sub ParseIt()
Dim rng1 As Range
Dim rng2 As Range
Dim regEx
Set rng1 = Range([a1], [a10])
Set regEx = CreateObject("VBScript.RegExp")
With regEx
'match entire string
.Pattern = "^ID_(\d{3})$"
'match anywhere
' .Pattern = "ID_(\d{3})"
.IgnoreCase = False
For Each rng2 In rng1
If .test(rng2.Value) Then
'use Anchor:=rng2.Offset(0, 1) to dump one column to the right)
ActiveSheet.Hyperlinks.Add Anchor:=rng2, Address:="", SubAddress:= _
Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address, TextToDisplay:=Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address
End If
Next
End With
End Sub
This is because: "=HYPERLINK(A" & CStr(CInt("$1") + 2) is evaluated once, when the code is executed, not once for every match.
You need to capture & process the match like this;
a_cell_Value = "*ID_006*"
Set matches = regEx.Execute(a_cell_Value)
Debug.Print "=HYPERLINK(A" & CLng(matches(0).SubMatches(0)) + 2 & ")"
>> =HYPERLINK(A8)
Or if they are all in ??_NUM format;
a_cell_Value = "ID_11"
?"=HYPERLINK(A" & (2 + val(mid$(a_cell_Value, instr(a_cell_Value,"_") +1))) & ")"
=HYPERLINK(A13)
The line -
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
won't work as VBA will try to do a CInt on the literal string "$1" rather than on the match from your RegEx.
It would work if you did your replace in 2 steps, something like this -
Dim a_cell
a_cell = Sheets(1).Cells(1, 1)
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
a_cell = regEx.Replace(a_cell, "$1")
Sheets(1).Cells(1, 1) = "=HYPERLINK(A" & CStr(CInt(a_cell) + 2) & ")"