VBA Word: Regex replace capture group does not highlight - regex

I cannot figure out why the code is not working. I want to find within a cell (XXX) regex match and highlight each occurrence. Tried a lot of things.
rangeFinding.SetRange Start:=rngStart.Start, End:=rngEnd.End
rangeFinding.Select
With Selection
.Tables(1).cell(2, 2).Range.Text = y
End With
Dim qq As Range
Set qq = GetCellTextRange(Selection.Tables(1).cell(2, 2))
Options.DefaultHighlightColorIndex = wdYellow
With qq.Find
.ClearFormatting
.Text = "(\(*\))"
.MatchWildcards = True
.Replacement.ClearFormatting
.Replacement.Text = "\1"
.Replacement.Highlight = True
.Replacement.Font.Color = vbRed
.Execute Replace:=wdReplaceAll
End With
qq.HighlightColorIndex = wdYellow
End If
Procedure that is called:
Function GetCellTextRange(cl As cell) As Range ' ' GetCellTextRange
returns the content range of a table cell '
Dim rng As Range
Set rng = cl.Range
rng.MoveEnd wdCharacter, -1
Set GetCellTextRange = rng
End Function
Changing font color and/or bold does work, however highlighting not. Even when I replace \1 with YYY it works. But highlighting does not get applied, either when I use .Replacement.Highlight or alternative qq.HighlightColorIndex = wdYellow.
BTW: does the Find (and its Execute) function adapt qq range? That is what I noticed, but the docs do not tell it clearly.

I ran into the same problem with wildcard search/replace in VBA code not highlighting text. I was able to work around it by explicitly setting the default highlight colour before running the wildcard search/replace, and then setting it back to whatever it was afterward, e.g.:
currentHighlightColour = Options.DefaultHighlightColorIndex
Options.DefaultHighlightColorIndex = wdYellow
With ActiveDocument.Range.Find
.ClearFormatting
.replacement.ClearFormatting
.Text = "(\{\{placeholder: *\}\})"
.Format = True
.replacement.Text = "\1"
.Forward = True
.matchWildcards = True
.matchCase = False
.replacement.Highlight = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
Options.DefaultHighlightColorIndex = currentHighlightColour
Seems like a bug in Word, but at least there's a workaround.

Related

Word VBA use Regex for formatting change

I have a word document that has SQL code in it. I have great code that finds specific terms I didn't realize that I had so many different terms in my document. I stopped at over 140 terms. I tried to add regex but I did it wrong. Would it be possible to have it find the start word SELECT and the end word FROM and change the background color to wdBrightGreen for everything between those terms?
I did not put all the fields in I stopped at three. Is there a way to put the ArrFind to search the whole document instead of specific terms?
Dim ArrFnd As Variant
Dim i As Long
Dim x As Long
Dim y As Long
Dim regexObject As RegExp
Dim strField1(1 To 200) As String
Dim strField2(1 To 20) As String
strField1(1) = "SELECT DISTINCT policy_id"
strField1(2) = "SELECT DISTINCT policy_id"
strField1(3) = "SELECT DISTINCT P.policy_id"
For x = 1 To 150
ArrFnd = Array(strField1(x))
With ActiveDocument
For i = 0 To UBound(ArrFnd)
With .Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchCase = True
.MatchWholeWord = True
.Text = ArrFnd(i)
With regexObject
.Pattern = "[SELECT-FROM]"
End With
.Replacement.Text = ""
End With
Do While .Find.Execute
.Font.Shading.BackgroundPatternColorIndex = wdBrightGreen
.Collapse wdCollapseEnd
Loop
End With
Next i
End With
Next x
One wonders why you persist with the approach taken in your code, in light of: MS Word VBA Can't select specific words to change background color
Regardless, for what you are now describing, all you need is:
Sub HiliteWords()
Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "SELECT[!^13^l]#FROM"
.Replacement.Text = ""
End With
Do While .Find.Execute
.Start = .Words.First.End
.End = .Words.Last.Start - 1
.Font.Shading.BackgroundPatternColorIndex = wdBrightGreen
.Collapse wdCollapseEnd
Loop
End With
Application.ScreenUpdating = True
End Sub
If you also don't want DISTINCT shaded, insert:
If .Words.First.Text = "DISTINCT " Then .Start = .Words.First.End
before:
.End = .Words.Last.Start - 1

How to change text style into vba foreach loop

I have this regex code and it´s working fine, but I´m trying to manipulate the text inside foreach but is taking no effect at the document.
I want to change the style of the selection and cut the markup off.
Sub Substituir()
Set documento = ActiveDocument.Range
Dim texto As String
Set oRegExp = New RegExp
oRegExp.Pattern = "<h1>[\s\S]*?</h1>"
oRegExp.Global = True
oRegExp.MultiLine = True
Dim resultado As MatchCollection
Set resultado = oRegExp.Execute(documento)
For Each r In resultado
r.Find.Execute FindText:="<h1>", ReplaceWith:="", Replace:=wdReplaceAll
r.Find.Execute FindText:="</h1>", ReplaceWith:="", Replace:=wdReplaceAll
Next
End Sub
The result I´m want is to replace
<h1>bla bla</h1>
with
bla bla
And than change the style to Heading 1 in word.
I would only resort to regex when I cannot achieve what I want with Words built in built in wildcard search.
Option Explicit
Sub Substituir(Optional ByVal this_tag As String = "h1", Optional ByVal this_replacement_style As String = "Heading 1")
With ActiveDocument.Content
With .Find
.ClearFormatting
.Text = "(\<" & this_tag & "\>)(*)(\</" & this_tag & "\>)" ' Default is "(<h1>)(*)(</h1>)"
.Replacement.Text = "\2"
.Format = True
.Replacement.Style = this_replacement_style
.Wrap = wdFindContinue
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End With
End Sub

Find words with more than one capital letter in word/VBA

I have a piece of VBA code that uses Find to find all the acronyms in a document. It does this by searching for all words consisting of capital letters that are 2 or more characters long using...
<[A-Z]{2,}>
The problem with this is it doesn't pick up all the acronyms, such as CoP, W3C, DVDs and CD-ROM. It picks up hyphenated acronyms in two parts which are not ideal but allowable as the list is checked by a user. I can also pick up words that end with an "s" or other characters by not searching to the end of the word using...
<[A-Z]{2,}
But this doesn't count any non-upper case character as part of the word it finds.
Is there an expression that would allow me to search for words with two or more capital letters in any location and find the whole word?
I don't think it's possible to 'search for words with two or more capital letters in any location and find the whole word' except in combination with macro code. Since you're using a macro, anyway, here's an approach that worked for me using the following sample text
CoP, this That and AnoTher thing W3C, DVDs and CD-ROM
and this wildcard combination (note that the list separator in my Windows configuration is ;, for other regions a , may be required).
<[A-Z][0-9A-Z\-a-z]{1;10}>
The following function checks whether the second or any later letter in the "found" range is capitalized and returns a boolean to the calling procedure. It loops through the characters in the given Range, checking the ASCII value. As soon as one is found, the loop exits.
Function ContainsMoreThanOneUpperCase(rng As Word.Range) As Boolean
Dim nrChars As Long, i As Long
Dim char As String
Dim HasUpperCase
HasUpperCase = False
nrChars = rng.Characters.Count
For i = 2 To nrChars
char = rng.Characters(i).text
If Asc(char) >= 65 And Asc(char) <= 90 Then
'It's an uppercase letter
HasUpperCase = True
Exit For
End If
Next
ContainsMoreThanOneUpperCase = HasUpperCase
End Function
An example for using it:
Sub FindAcronyms()
Dim rngFind As Word.Range
Dim bFound As Boolean
Set rngFind = ActiveDocument.content
With rngFind.Find
.text = "<[A-Z][0-9A-Z\-a-z]{1;10}>"
.MatchWildcards = True
.Forward = True
.wrap = wdFindStop
bFound = .Execute
Do While bFound
If bFound And ContainsMoreThanOneUpperCase(rngFind) Then
Debug.Print rngFind.text
rngFind.HighlightColorIndex = wdBrightGreen
End If
rngFind.Collapse wdCollapseEnd
bFound = .Execute
Loop
End With
End Sub
You can't do this in one pass of Find/Replace. You also have to make some allowances for what the Word application considers a Word and then where the acronym is located in the sentence or paragraph.
The following code should provide an idea for how you might do it with a combination of Wildcard searching and then additional VBA string manipulation.
It is setup to deal with words that start with capital letters, you will need to carry it further and add code and wildcard search criteria for words that start with lowercase letters if you expect to have any of those.
Sub FindAcronynms()
Dim rng As word.Range
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "<[A-Z]{1,}[a-z][A-Z]>"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "[A-Z]{1,5}[0-9][A-Z]{1,5}"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "<[A-Z]{2,}>"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
MsgBox "Action Complete", vbExclamation, "Custom Find"
End Sub
Private Function MoveEndOfString(ByRef rng As word.Range)
rng.MoveEnd wdCharacter, 1
Select Case Asc(rng.Characters.Last)
Case Is <= 32
rng.MoveEnd wdCharacter, -1
Case 45
rng.MoveEnd wdCharacter, 1
rng.MoveEnd wdWord, 1
If Asc(rng.Characters.Last) = 32 Then
'required because move above includes
'trailing space
rng.MoveEnd wdCharacter, -1
End If
End Select
End Function
You might use something like:
Sub Demo()
Application.ScreenUpdating = False
Options.DefaultHighlightColorIndex = wdPink
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Highlight = True
.Forward = True
.Format = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Text = "<[A-Z][A-Z0-9/-]{1,}"
.Replacement.Text = "^&"
.Execute Replace:=wdReplaceAll
.Text = "<[A-Z][A-Za-z0-9/-]#[A-Z]"
.Replacement.Text = "^&"
.Execute Replace:=wdReplaceAll
End With
End With
Application.ScreenUpdating = True
End Sub

Remove Hebrew vowels (nikkud) from selected Unicode Hebrew text

I want to select a string of Unicode Hebrew text in a Word document and remove the Hebrew vowels (aka nikkud) without changing anything else.
I need to remove Unicode characters in a given range from the selected text. The Unicode characters I want to remove are U+0591-U+05BD, U+05BF-U+05C2, and U+05C4-U+05C7.
I found a way to remove the Hebrew vowels from a Unicode text string using the REGEXREPLACE function in Google Sheets (thank you GitHub). E.g:
=REGEXREPLACE(B1,"[(\x{0591}-\x{05BD})OR(\x{05BF}-\x{05C2})OR(\x{05C4}-\x{05C7})]","")
where cell B1 contains the original Hebrew text with vowels, and the function outputs the identical text with the vowels removed. The Unicode range used there permits me to leave two characters that need to remain (U+05BE and U+05C3).
Using that method, I can copy a Hebrew text string, e.g., אָמַר יְהוָה, paste it into my Google Sheet, and then copy the output, אמר יהוה, and paste it over the original text. This is much slower than a macro in Word would be (there are hundreds of these Hebrew text strings that need to be fixed). The majority of the document is in English, with snippets of Hebrew, so I don't need a solution for converting a whole document.
A bit of searching suggests to me that a similar RegEx replace function exists for Word VBA, but I don't have sufficient programming knowledge to adapt this to my own needs.
You can try this Macro. Be warned, it's very slow on my end:
Sub RemoveHebrewVowels()
Dim Word As Range
Dim Words As Variant
Dim WildcardCollection(3) As String
Rem [(\x{0591}-\x{05BD}]
WildcardCollection(0) = "[" & ChrW(1425) & "-" & ChrW(1469) & "]{1;}"
Rem [\x{05BF}-\x{05C2}]
WildcardCollection(1) = "[" & ChrW(1471) & "-" & ChrW(1474) & "]{1;}"
Rem [\x{05C4}-\x{05C7}]
WildcardCollection(2) = "[" & ChrW(1476) & "-" & ChrW(1479) & "]{1;}"
'Options.DefaultHighlightColorIndex = wdYellow
'Clear existing formatting and settings in Find
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
'Selection.Find.Replacement.Highlight = True
'Cycle through document and find wildcards patterns, replace when found
For Each Word In ActiveDocument.Words
For Each WildcardsPattern In WildcardCollection
With Selection.Find
.Text = WildcardsPattern
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Next
Next
End Sub
You can install notepad++ and do a find and replace operation using regex mode using this regex after pasting your whole input.
[\x{0591}-\x{05BD}\x{05BF}-\x{05C2}\x{05C4}-\x{05C7}]
Before:
After:
Then you can automate the copy/paste operation using AutoHotkey for example
If you want to keep the formatting information this is not a problem neither.
Just do the following operations:
Save your file in Word XML Document (Save as>Save as type: Word XML Document (*.xml)
Take a copy of this file and open it with Notepad++ (you have either to take a copy of the file or to close Word otherwise you can not open it in write mode)
Apply the find and replace described in the beginning of the explanations and save the file.
Reopen the file with Word and save it .docx for example
Thanks, everyone. Building on several of these suggestions, I put together the following macro, which seems to be working perfectly. There may be a more elegant way to write this (wp78de's macro seems more consolidated, but it didn't work for me).
Sub HebrewDevocalizer()
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[" & ChrW(1425) & "-" & ChrW(1469) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchKashida = False
.MatchDiacritics = False
.MatchAlefHamza = False
.MatchControl = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[" & ChrW(1471) & "-" & ChrW(1474) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchKashida = False
.MatchDiacritics = False
.MatchAlefHamza = False
.MatchControl = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[" & ChrW(1476) & "-" & ChrW(1479) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchKashida = False
.MatchDiacritics = False
.MatchAlefHamza = False
.MatchControl = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Anyone who needs to use this in a software script (Python 3), you can do
import re
re.sub(r'[\u0591-\u05BD\u05BF-\u05C2\u05C4-\u05C7]', '', 'אֱלֹהִים')
BS"D
Do a "Save as" to other format- Hebrew DOS text.
Reload the file in Word and you will see that a question mark has replaced each nikud.
Do a global change (cntrl H) of '?' to null.
All done

Regex Word Macro that finds two words within a range of each other and then italicizes those words?

So, I'm just beginning to understand Regular Expressions and I've found the learning curve fairly steep. However, stackoverflow has been immensely helpful in the process of my experimenting. There is a particular word macro that I would like to write but I have not figured out a way to do it. I would like to be able to find two words within 10 or so words of each other in a document and then italicize those words, if the words are more than 10 words apart or are in a different order I would like the macro not to italicize those words.
I have been using the following regular expression:
\bPanama\W+(?:\w+\W+){0,10}?Canal\b
However it only lets me manipulate the entire string as a whole including random words in between. Also the .Replace function only lets me replace that string with a different string not change formatting styles.
Does any more experienced person have an idea as to how to make this work? Is it even possible to do?
EDIT: Here is what I have so far. There are two problems I am having. First I don't know how to only select the words "Panama" and "Canal" from within a matched Regular Expression and replace only those words (and not the intermediate words). Second, I just don't know how to replace a Regexp that is matched with a different format, only a different string of text - probably just as a result of a lack of familiarity with word macros.
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Set re = New regExp
re.Pattern = "\bPanama\W+(?:\w+\W+){0,10}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
Set rng = para.Range
rng.MoveEnd unit:=wdCharacter, Count:=-1
Text$ = rng.Text + "Modified"
rng.Text = re.Replace(rng.Text, Text$)
Next para
End Sub
Ok, thanks to help from Tim Williams below I got the following solution together, it's more than a little clumsy in some respects and it is by no means pure regexp but it does get the job done. If anyone has a better solution or idea about how to go about this I'd be fascinated to hear it though. Again, my brute forcing the changes with the search and replace feature is a little embarrassingly crude but at least it works...
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New regExp
re.pattern = "\bPanama\W+(?:\w+\W+){0,13}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, m.Length
rng.Font.ColorIndex = wdOrange
Next m
End If
Next para
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Panama"
.Replacement.Text = "Panama"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Canal"
.Replacement.Text = "Canal"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.ColorIndex = wdBlack
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
I'm a long way off being a decent Word programmer, but this might get you started.
EDIT: updated to include a parameterized version.
Sub Tester()
HighlightIfClose ActiveDocument, "panama", "canal", wdBrightGreen
HighlightIfClose ActiveDocument, "red", "socks", wdRed
End Sub
Sub HighlightIfClose(doc As Document, word1 As String, _
word2 As String, clrIndex As WdColorIndex)
Dim re As RegExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New RegExp
re.Pattern = "\b" & word1 & "\W+(?:\w+\W+){0,10}?" _
& word2 & "\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, Len(word1)
rng.HighlightColorIndex = clrIndex
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex + (m.Length - Len(word2))
rng.MoveEnd wdCharacter, Len(word2)
rng.HighlightColorIndex = clrIndex
Next m
End If
Next para
End Sub
If you're after just doing each 2 words at a time, this worked for me, following your practice lines.
foo([a-zA-Z0-9]+? ){0,10}bar
Explanation:
will grab word 1 (foo), then match anything that is a word of alphanumeric characters ([a-zA-Z0-9]+?) followed by a space (), 10 times ({0,10}), then word 2 (bar).
This doesn't include full stops (didn't know if you wanted them), but if you want to just add . after 0-9 in the regex.
So your (pseudocode) syntax will be similar to:
$matches = preg_match_all(); // Your function to get regex matches in an array
foreach (those matches) {
replace(KEY_WORD, <i>KEY_WORD</i>);
}
Hopefully it helps. Testing below, highlighted what it matched.
Worked:
The foo this that bar blah
The foo economic order war bar
Didn't Work
The foo economic order. war bar
The global foo order has been around for several centuries, over this period of time people have evolved different and intricate trade relationships dealing with situations such as agriculture and bar