Excel VBA - Looking up a string with wildcards - regex

Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn

Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub

If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub

Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).

Related

Get Full Range Text to String

I'm writing a script that looks through my outgoing emails and searches for frequent stylistic errors I make. It locates them using regex and then highlights them yellow. Code:
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
strText = LCase(oWordDoc.Range.Text)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.firstindex, varMtch.firstindex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub
extractMatches is a private function implementing VBA's RegEx engine. getBadStrs returns the regular expression containing the errors.
It all works unless I've embedded hyperlinks in my email. If so, oWordDoc.Range.Text returns only the anchor text of the links, not the links (and any other characters Word pads the hyperlinks with - I don't know what they might be). As a result, varMtch.firstindex is correct for strText but not oRange so the text it highlights is offset by several characters.
I tried to assemble the full oRange text by looping through the hyperlinks in oRange and adding the link text to the string assuming it would be included in oRange. Something like:
Dim lngEndLnk as Long
Set oRange = oWordDoc.Range
For Each varMtch In oRange.Hyperlinks
strText = strText & oWordDoc.Range(lngEndLnk, varMtch.Range.Start)
strText = strText & varMtch.TextToDisplay & varMtch.Name
lngEndLnk = varMtch.Range.End
Next varMtch
If lngEndLnk = 0 Then
strText = oRange.text
Else
strText = strText & oWordDoc.Range(lngEndLnk, oWordDoc.Range.End)
End If
That reduced the offset, but there still is one. Also, if I were to include a linked image in the email, the .Anchor property of varMtch fails so I'd have to come up with another workaround.
Is there a more straightforward way to get a String containing all the characters of the Range object so the regex indices line up?
You can access the hyperlink address using the hyperlinks collection of a document:
Private Sub CommandButton1_Click()
strtext = ActiveDocument.Range.Text
MsgBox (strtext)
For Each hLink In Documents(1).Hyperlinks
MsgBox (hLink.Address)
Next hLink
End Sub
This first displays all the text in a document, and then loops through each hyperlink displaying its URL.
This can then be used through your RegEx.
For more information and examples, see hyperlinks.
I ended up with a similar solution to #slightly snarky. I don't know that it's better so I won't mark it as the solution. Happy for comments on pros and cons, in case there's a clear winner I'm just not seeing.
Personally, I like looping the character collection and probably should use it in my code, this works. I find using the position array to highlight matches much less intuitive than constructing a string from the range. For my purposes padding the string with # in place of the zero-length characters in oWordDoc.Range works, but I also know it won't work for everybody.
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long, lngC As Long, lngPrevLen As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
For lngC = 0 To oWordDoc.Range.End - 1
strText = strText & oWordDoc.Range(lngC, lngC + 1)
If Len(strText) = lngPrevLen Then
strText = strText & "#"
End If
lngPrevLen = lngPrevLen + 1
Next lngC
strText = LCase(strText)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.FirstIndex, varMtch.FirstIndex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub
The key to this seems to be that when you iterate through a Range looking at each "position" in the range, e.g. via something like
With ActiveDocument.Range
For i = 0 to .End - 1
Debug.Print i,Ascw(.Range(i,i+1).Text)
Next
End With
The Range does contain all the characters in the code of a field such as HYPERLINK field, and all the characters in its result (which might be displayed or it might be hidden text). But in some cases a Range may contain additional characters which are never displayed. For example, if you have a field code such as { SET x 123 } then the Range contains what are in effect the field braces and the code " SET X 123 ", but before the field end brace it also contains a marker followed by the value "123". But the SET field does not display its result.
That makes it difficult to construct a "find" string that's the same length as the Range.
But Range.Text is the same text as the concatenation of all the characters in Range.Characters, and each Character in that Collection is a Range that contains the .Start position
So that lets us get the .Start and .End as the following example shows.
This assumes you are working with the ActiveDocument in Word, and have some text, a HYPERLINK field (say), and possibly other fields, with the text "test1" in various places.
I haven't done much testing so it may still need tweaking.
Sub HighlightFinds()
Dim match As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim rng1 As Word.Range
Dim rng2 As Word.Range
Set rng1 = ActiveDocument.Content
Set rng2 = ActiveDocument.Content ' or rng1.Duplicate
' When you do this, rng1.Text returns the text of the field *codes* but
' not the field *results*, and so does rng1.Characters
'rng1.TextRetrievalMode.IncludeFieldCodes = True
' when you do this, it returns the *results* but not the *codes*
rng1.TextRetrievalMode.IncludeFieldCodes = False
' You could do both, one after the other, to try to get all the matches
' You might also need to set .TextRetrievalMode.IncludeHiddenText
With New VBScript_RegExp_55.RegExp
.Pattern = "test1"
.Global = True
Set matches = .Execute(rng1.Text)
End With
For Each match In matches
rng2.SetRange rng1.Characters(match.FirstIndex + 1).Start, rng1.Characters(match.FirstIndex + 1 + match.Length).End
rng2.HighlightColorIndex = wdYellow
Next
Set matches = Nothing
Set rng2 = Nothing
Set rng1 = Nothing
End Sub

Extract four numbers without brackets from a bracketed entry, if entry exists

What I have:
A list of about 1000 titles of reports in column B.
Some of these titles have a four digit number surrounded by brackets (eg: (3672)) somewhere in a string of text and numbers.
I want to extract these four numbers - without brackets - in column C in the same row.
If there is no four digit number with brackets in column B, then to return "" in column C.
What I have so far:
I can successfully identify the cells in column B which have four digits surrounded by brackets. The problem is it returns the whole title including the four numbers.
Taken from: VBA RegEx extracting data from within a string
NB: I am Using Excel Professional Plus 2010, have checked the box next to "Microsoft VBScript Regular Expressions 5.5".
Sub ExtractTicker()
Dim regEx
Dim i As Long
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = True
regEx.pattern = "(\()([0-9]{4})(\))"
For i = 2 To ActiveSheet.UsedRange.Rows.Count
If (regEx.Test(Cells(i, 2).Value)) Then
Cells(i, 3).Value = regEx.Replace(Cells(i, 2).Value, "$2")
End If
Next i
End Sub
Try
regEx.pattern = "(.*\()([0-9]{4})(\).*)"
the .* and the start and end of the string ensure you capture the entire string, then this is fully substituted by the 2nd submatch ([0-9]{4})
To fully optimise the code
use variant arrays rather than ranges
setting Global and IgnoreCase is redundant when you are running a case insensitive match on the full string
you are using late binding so you dont need the Reference
code
Sub ExtractTicker()
Dim regEx As Object
Dim pattern As String
Dim X
Dim lngCNt As Long
X = Range([b1], Cells(Rows.Count, "B").End(xlUp)).Value2
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.pattern = "(.*\()([0-9]{4})(\).*)"
For lngCNt = 1 To UBound(X)
If .Test(X(lngCNt, 1)) Then
X(lngCNt, 1) = .Replace(X(lngCNt, 1), "$2")
Else
X(lngCNt, 1) = vbNullString
End If
Next
End With
[c1].Resize(UBound(X, 1), 1).Value2 = X
End Sub

use regex in a wrapped text value cell

I have a cell like this :
1 parent
1 child
I am getting the value in vba :
Dim nbChild As String
Dim nbParent As String
Sheets("Feuil1").Cells(C.Row - 1, C.Column).Value
However, I would like to put the number of parent and child in 2 separates variables nbParent and nbChild, so I was thinking to use regex to capture the groups (digit number before parent and digit number before child).
But I don't know how to do it. Thanks in advance for your help
Dim arr, parent, child
arr = Split(ActiveCell.Value, Chr(10))'split on hard return
parent=arr(0)
child=arr(1)
'then split each line on space....
debug.print Split(parent," ")(0) 'number
debug.print Split(parent," ")(1) 'text
debug.print Split(child," ")(0) 'number
debug.print Split(child," ")(1) 'text
With data like:
click on a cell and run:
Sub Family()
ary = Split(ActiveCell.Value, " ")
nParent = CLng(ary(0))
nChild = CLng(ary(2))
MsgBox nParent & vbCrLf & nChild
End Sub
(regex is not necessary)
I agree with #Gary's Student that you don't seem to need Regex but if you insist on it, I think the code below works. You'll need to add in reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Sub main()
Dim value As String
Dim re As VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
Dim match As VBScript_RegExp_55.match
value = Range("A1").value
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "\d+"
re.Global = True
Set matches = re.Execute(value)
For Each match In matches
Debug.Print match.value
Next
End Sub

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function