Get Full Range Text to String - regex

I'm writing a script that looks through my outgoing emails and searches for frequent stylistic errors I make. It locates them using regex and then highlights them yellow. Code:
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
strText = LCase(oWordDoc.Range.Text)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.firstindex, varMtch.firstindex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub
extractMatches is a private function implementing VBA's RegEx engine. getBadStrs returns the regular expression containing the errors.
It all works unless I've embedded hyperlinks in my email. If so, oWordDoc.Range.Text returns only the anchor text of the links, not the links (and any other characters Word pads the hyperlinks with - I don't know what they might be). As a result, varMtch.firstindex is correct for strText but not oRange so the text it highlights is offset by several characters.
I tried to assemble the full oRange text by looping through the hyperlinks in oRange and adding the link text to the string assuming it would be included in oRange. Something like:
Dim lngEndLnk as Long
Set oRange = oWordDoc.Range
For Each varMtch In oRange.Hyperlinks
strText = strText & oWordDoc.Range(lngEndLnk, varMtch.Range.Start)
strText = strText & varMtch.TextToDisplay & varMtch.Name
lngEndLnk = varMtch.Range.End
Next varMtch
If lngEndLnk = 0 Then
strText = oRange.text
Else
strText = strText & oWordDoc.Range(lngEndLnk, oWordDoc.Range.End)
End If
That reduced the offset, but there still is one. Also, if I were to include a linked image in the email, the .Anchor property of varMtch fails so I'd have to come up with another workaround.
Is there a more straightforward way to get a String containing all the characters of the Range object so the regex indices line up?

You can access the hyperlink address using the hyperlinks collection of a document:
Private Sub CommandButton1_Click()
strtext = ActiveDocument.Range.Text
MsgBox (strtext)
For Each hLink In Documents(1).Hyperlinks
MsgBox (hLink.Address)
Next hLink
End Sub
This first displays all the text in a document, and then loops through each hyperlink displaying its URL.
This can then be used through your RegEx.
For more information and examples, see hyperlinks.

I ended up with a similar solution to #slightly snarky. I don't know that it's better so I won't mark it as the solution. Happy for comments on pros and cons, in case there's a clear winner I'm just not seeing.
Personally, I like looping the character collection and probably should use it in my code, this works. I find using the position array to highlight matches much less intuitive than constructing a string from the range. For my purposes padding the string with # in place of the zero-length characters in oWordDoc.Range works, but I also know it won't work for everybody.
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long, lngC As Long, lngPrevLen As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
For lngC = 0 To oWordDoc.Range.End - 1
strText = strText & oWordDoc.Range(lngC, lngC + 1)
If Len(strText) = lngPrevLen Then
strText = strText & "#"
End If
lngPrevLen = lngPrevLen + 1
Next lngC
strText = LCase(strText)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.FirstIndex, varMtch.FirstIndex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub

The key to this seems to be that when you iterate through a Range looking at each "position" in the range, e.g. via something like
With ActiveDocument.Range
For i = 0 to .End - 1
Debug.Print i,Ascw(.Range(i,i+1).Text)
Next
End With
The Range does contain all the characters in the code of a field such as HYPERLINK field, and all the characters in its result (which might be displayed or it might be hidden text). But in some cases a Range may contain additional characters which are never displayed. For example, if you have a field code such as { SET x 123 } then the Range contains what are in effect the field braces and the code " SET X 123 ", but before the field end brace it also contains a marker followed by the value "123". But the SET field does not display its result.
That makes it difficult to construct a "find" string that's the same length as the Range.
But Range.Text is the same text as the concatenation of all the characters in Range.Characters, and each Character in that Collection is a Range that contains the .Start position
So that lets us get the .Start and .End as the following example shows.
This assumes you are working with the ActiveDocument in Word, and have some text, a HYPERLINK field (say), and possibly other fields, with the text "test1" in various places.
I haven't done much testing so it may still need tweaking.
Sub HighlightFinds()
Dim match As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim rng1 As Word.Range
Dim rng2 As Word.Range
Set rng1 = ActiveDocument.Content
Set rng2 = ActiveDocument.Content ' or rng1.Duplicate
' When you do this, rng1.Text returns the text of the field *codes* but
' not the field *results*, and so does rng1.Characters
'rng1.TextRetrievalMode.IncludeFieldCodes = True
' when you do this, it returns the *results* but not the *codes*
rng1.TextRetrievalMode.IncludeFieldCodes = False
' You could do both, one after the other, to try to get all the matches
' You might also need to set .TextRetrievalMode.IncludeHiddenText
With New VBScript_RegExp_55.RegExp
.Pattern = "test1"
.Global = True
Set matches = .Execute(rng1.Text)
End With
For Each match In matches
rng2.SetRange rng1.Characters(match.FirstIndex + 1).Start, rng1.Characters(match.FirstIndex + 1 + match.Length).End
rng2.HighlightColorIndex = wdYellow
Next
Set matches = Nothing
Set rng2 = Nothing
Set rng1 = Nothing
End Sub

Related

Find '~XX~' within a string with specific values

I have classic ASP written in VBScript. I have a record pulled from SQL Server and the data is a string. In this string, I need to find text enclosed in ~12345~ and I need to replace with very specific text. Example 1 would be replaced with M, 2 would be replaced with A. I then need to display this on the web page. We don't know how many items will be enclosed with ~.
Example Data:
Group Pref: (To be paid through WIT)
~2.5~ % Quarterly Rebate - Standard Commercial Water Heaters
Display on webpage after:
Group Pref: (To be paid through WIT)
~A.H~ % Quarterly Rebate - Standard Commercial Water Heaters
I tried this following, but there are two many cases and this would be unrealistic to maintain. I does replace the text and display correctly.
dim strSearchThis
strSearchThis =(rsResults("PREF"))
set re = New RegExp
with re
.global = true
.pattern = "~[^>]*~"
strSearchThis = .replace(strSearchThis, "X")
end with
I am also trying this code, I can find the text contained between each ~ ~, but when displayed its the information between the ~ ~ is not changed:
dim strSearchThis
strSearchThis =(rsResults("PREF"))
Set FolioPrefData = New RegExp
FolioPrefData.Pattern = "~[^>]*~"
FolioPrefData.Global = True
FolioPrefData.IgnoreCase = True
'will contain all found instances of ~ ~'
set colmatches = FolioPrefData.Execute(strSearchThis)
Dim itemLength, found
For Each objMatch in colMatches
Select Case found
Case "~"
'ignore - doing nothing'
Case "1"
found = replace(strSearchThis, "M")
End Select
Next
response.write(strSearchThis)
You can do it without using Regular Expressions, just checking the individual characters and writing a function that handles the different cases you have. The following function finds your delimited text and loops through all characters, calling the ReplaceCharacter function defined further down:
Function FixString(p_sSearchString) As String
Dim iStartIndex
Dim iEndIndex
Dim iIndex
Dim sReplaceString
Dim sReturnString
sReturnString = p_sSearchString
' Locate start ~
iStartIndex = InStr(sReturnString, "~")
Do While iStartIndex > 0
' Look for end ~
iEndIndex = InStr(iStartIndex + 1, sReturnString, "~")
If iEndIndex > 0 Then
sReplaceString = ""
' Loop htrough all charatcers
For iIndex = iStartIndex + 1 To iEndIndex - 1
sReplaceString = sReplaceString & ReplaceCharacter(Mid(sReturnString, iIndex, 1))
Next
' Replace string
sReturnString = Left(sReturnString, iStartIndex) & sReplaceString & Mid(sReturnString, iEndIndex)
' Locate next ~
iStartIndex = InStr(iEndIndex + 1, sReturnString, "~")
Else
' End couldn't be found, exit
Exit Do
End If
Loop
FixString = sReturnString
End Function
This is the function where you will enter the different character substitutions you might have:
Function ReplaceCharacter(p_sCharacter) As String
Select Case p_sCharacter
Case "1"
ReplaceCharacter = "M"
Case "2"
ReplaceCharacter = "A"
Case Else
ReplaceCharacter = p_sCharacter
End Select
End Function
You can use this in your existing code:
response.write(FixString(strSearchThis))
You can also use a Split and Join method...
Const SEPARATOR = "~"
Dim deconstructString, myOutputString
Dim arrayPointer
deconstructString = Split(myInputString, SEPARATOR)
For arrayPointer = 0 To UBound(deconstructString)
If IsNumeric(deconstructString(arrayPointer)) Then
'Do whatever you need to with your value...
End If
Next 'arrayPointer
myOutputString = Join(deconstructString, "")
This does rely, obviously, on breaking a string apart and rejoining it, so there is a sleight overhead on string mutability issues.

How to extract ad sizes from a string with excel regex

I am trying to extract ad sizes from string. The ad sizes are all set standard sizes. So while I'd prefer to have a regex that looks for a pattern, IE 3 numbers followed by 2 or 3 numbers, hard coding it will also work, since we know what the sizes will be. Here's an example of some of the ad sizes:
300x250
728x90
320x50
I was able to find some VBScript that I modified that almost works, but because my strings that I'm searching are inconsistent, it's pulling too much in some cases. For example:
You see how it's not matching correctly in every instance.
The VB code I found is actually matching everything EXCEPT that ad sizes. I don't know enough about VBScript to reverse it to just look for ad sizes and pull them. So instead it looks for all other text and removes it.
The code is below. Is there a way to fix the Regex so that it just returns the ad sizes?
Function getAdSize(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([^300x250|728x90])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
getAdSize = regEx.Replace(strInput, strReplace)
Else
getAdSize = "Not matched"
End If
End If
End Function
NOTE, THE DATA IS NOT ALWAYS PRECEDED BY AN UNDERSCORE, SOMETIMES IT IS A DASH OR A SPACE BEFORE AND AFTER.
EDIT: Since it's not actually underscore delimited we can't use Split. We can however iterate over the string and extract the "#x#" manually. I have updated the code to reflect this and verified that it works successfully.
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
If IsNumeric(Temp) Then
Ad = Ad & Temp
Else
If Temp = "x" Then
Ad = Ad & Temp
Else
If Ad Like "*#x#*" Then
ExtractAdSize = Ad
Exit Function
Else
Ad = vbNullString
End If
End If
End If
Next i
End If
End Function
Alternate version of the same function using Select Case boolean logic instead of nested If statements:
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
Select Case Abs(IsNumeric(Temp)) + Abs((Temp = "x")) * 2 + Abs((Ad Like "*#x#*")) * 4
Case 0: Ad = vbNullString 'Temp is not a number, not an "x", and Ad is not valid
Case 1, 2, 5: Ad = Ad & Temp 'Temp is a number or an "x"
Case 4, 6: ExtractAdSize = Ad 'Temp is not a number, Ad is valid
Exit Function
End Select
Next i
End If
End Function
I have managed to make about 95% of the required answer - the RegEx below will remove the DDDxDD size and would return the rest.
Option Explicit
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
E.g. you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
From here you can continue trying to find a way to reverse the display.
Edit:
To go from the 95% to the 100%, I have asked a question here and it turns out that the conditional block should be changed to the following:
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
This formula could work if it's always 3 characters, then x, and it's always between underscores - adjust accordingly.
=iferror(mid(A1,search("_???x*_",A1)+1,search("_",A1,search("_???x*_",A1)+1)-(search("_???x*_",A1)+1)),"No match")

Excel VBA - Looking up a string with wildcards

Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub
Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).

VBA to edit excel column of data

I have a requirement to edit a column of data where each cell has to be edited to remove all non-numeric characters. The only data that I need are actual numbers and a decimal point if one was there originally. I found a piece of code that removes everything with the exception of a "%" character. If someone could look at the code below and let me know how to modify it I would be appreciative. Examples of the type of data I am editing are as follows Complete cell contents enclosed in quotes). "3" "2.5%" "17 nks" "3.00 %" "4 VNS"
Here's the code I have used;
Sub RemoveAlphas()
'' Remove alpha characters from a string.
Dim intI As Integer
Dim rngR As Range, rngRR As Range
Dim strNotNum As String, strTemp As String
Set rngRR = Selection.SpecialCells(xlCellTypeConstants, _
xlTextValues)
For Each rngR In rngRR
strTemp = ""
For intI = 1 To Len(rngR.Value)
If Mid(rngR.Value, intI, 1) Like "[0-9,.]" Then
strNotNum = Mid(rngR.Value, intI, 1)
Else: strNotNum = ""
End If
strTemp = strTemp & strNotNum
Next intI
rngR.Value = strTemp
Next rngR
End Sub
Thanks.
This can be done using Regex as shown below- Have tested with your exact sample data and works for me:
Sub RemoveAlphas()
'' Remove alpha characters from a string.
Dim intI As Integer
Dim rngR As Range, rngRR As Range
Dim strNotNum As String, strTemp As String
Dim RegEx As Object
Set rngRR = Selection.SpecialCells(xlCellTypeConstants, _
xlTextValues)
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^\d.]+"
For Each rngR In rngRR
rngR.Value = RegEx.Replace(rngR.Value, "")
Next rngR
End Sub
If you are getting the results you need with the exception of a percent sign, you can insert into this part of your code a Replace function:
Next intI
strTemp = Replace(strTemp, "%", "") 'Remove the % sign and replace with nothing.
rngR.Value = strTemp
Next rngR
Try the RexExp below which builds on my code from Remove non-numeric characters from a range of cells, and uses variant arrays for speed.
The RegExp pattern is [^\d\.]+
Sub KillNonNumbers()
Dim rng1 As Range
Dim rngArea As Range
Dim lngRow As Long
Dim lngCol As Long
Dim lngCalc As Long
Dim objReg As Object
Dim X()
On Error Resume Next
Set rng1 = Selection.SpecialCells(xlCellTypeConstants, xlTextValues)
If rng1 Is Nothing Then Exit Sub
On Error GoTo 0
'See Patrick Matthews excellent article on using Regular Expressions with VBA
Set objReg = CreateObject("vbscript.regexp")
objReg.Pattern = "[^\d\.]+"
objReg.Global = True
'Speed up the code by turning off screenupdating and setting calculation to manual
'Disable any code events that may occur when writing to cells
With Application
lngCalc = .Calculation
.ScreenUpdating = False
.Calculation = xlCalculationManual
.EnableEvents = False
End With
'Test each area in the user selected range
'Non contiguous range areas are common when using SpecialCells to define specific cell types to work on
For Each rngArea In rng1.Areas
'The most common outcome is used for the True outcome to optimise code speed
If rngArea.Cells.Count > 1 Then
'If there is more than once cell then set the variant array to the dimensions of the range area
'Using Value2 provides a useful speed improvement over Value. On my testing it was 2% on blank cells, up to 10% on non-blanks
X = rngArea.Value2
For lngRow = 1 To rngArea.Rows.Count
For lngCol = 1 To rngArea.Columns.Count
'replace the leading zeroes
X(lngRow, lngCol) = objReg.Replace(X(lngRow, lngCol), vbNullString)
Next lngCol
Next lngRow
'Dump the updated array sans leading zeroes back over the initial range
rngArea.Value2 = X
Else
'caters for a single cell range area. No variant array required
rngArea.Value = objReg.Replace(rngArea.Value, vbNullString)
End If
Next rngArea
'cleanup the Application settings
With Application
.ScreenUpdating = True
.Calculation = lngCalc
.EnableEvents = True
End With
Set objReg = Nothing
End Sub

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3