I have this regex code and it´s working fine, but I´m trying to manipulate the text inside foreach but is taking no effect at the document.
I want to change the style of the selection and cut the markup off.
Sub Substituir()
Set documento = ActiveDocument.Range
Dim texto As String
Set oRegExp = New RegExp
oRegExp.Pattern = "<h1>[\s\S]*?</h1>"
oRegExp.Global = True
oRegExp.MultiLine = True
Dim resultado As MatchCollection
Set resultado = oRegExp.Execute(documento)
For Each r In resultado
r.Find.Execute FindText:="<h1>", ReplaceWith:="", Replace:=wdReplaceAll
r.Find.Execute FindText:="</h1>", ReplaceWith:="", Replace:=wdReplaceAll
Next
End Sub
The result I´m want is to replace
<h1>bla bla</h1>
with
bla bla
And than change the style to Heading 1 in word.
I would only resort to regex when I cannot achieve what I want with Words built in built in wildcard search.
Option Explicit
Sub Substituir(Optional ByVal this_tag As String = "h1", Optional ByVal this_replacement_style As String = "Heading 1")
With ActiveDocument.Content
With .Find
.ClearFormatting
.Text = "(\<" & this_tag & "\>)(*)(\</" & this_tag & "\>)" ' Default is "(<h1>)(*)(</h1>)"
.Replacement.Text = "\2"
.Format = True
.Replacement.Style = this_replacement_style
.Wrap = wdFindContinue
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End With
End Sub
Related
I want to round numbers embedded in text in a selection in Word. Unlike many solutions for rounding on the net, the values are not isolated in table cells etc. but may be in the text and have additional characters around them.
Examples: 0.0044***, (0.0040–0.0047) +/-0.0012.
I adapted the following code from this post which was designed to round to whole numbers:
Sub RoundNumbers()
Dim OrigRng As Range
Dim WorkRng As Range
Dim FindPattern As String
Dim FoundVal As String
Dim decplace as Integer
Set OrigRng = Selection.Range
Set WorkRng = Selection.Range
FindPattern = "([0-9]){1,}.[0-9]{1,}"
decplace = 3
Do
With WorkRng
.SetRange OrigRng.Start, OrigRng.End ' using "set WorkRng = OrigRng" would cause them to point to the same object (OrigRng is also changed when WorkRng is changed)
If .Find.Execute(findtext:=FindPattern, Forward:=True, _
MatchWildcards:=True) Then
.Expand wdWord ' I couldn't find a reliable way for greedy matching with Word regex, so I expand found range to word
.Text = FormatNumber(Round(CDbl(.Text) + 0.000001, decplace), decplace, vbTrue)
End If
End With
Loop While WorkRng.Find.Found
End Sub
I thought I could extend the Round function to round to a specified number of decimals, e.g. .Text = round(CDbl(.Text) + 0.000001, 3).
The problem with this is that the macro continues to find the first value in the selection and doesn't move to subsequent numbers. Presumably this is because, unlike the whole numbers, the rounded values still match the regex.
A solution suggested was to replace the count of digits post decimal from one or more {1,} with a fixed value e.g., {4}. This works if all the values to be rounded have the same format but doesn't have the flexibility I need.
So how can I get it to move to the next value? Alternatively, does anyone have a better solution?
For example, to process a whole document:
Sub DemoA()
Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "<[0-9]#.[0-9]#>"
End With
Do While .Find.Execute = True
.Text = Format(.Text, "0.000")
.Collapse wdCollapseEnd
Loop
End With
Application.ScreenUpdating = True
End Sub
or to process just a selected range:
Sub DemoB()
Application.ScreenUpdating = False
Dim Rng As Range
With Selection
Set Rng = .Range
.Collapse wdCollapseStart
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "<[0-9]#.[0-9]#>"
End With
Do While .Find.Execute = True
If .InRange(Rng) = False Then Exit Do
.Text = Format(.Text, "0.000")
.Collapse wdCollapseEnd
Loop
End With
Application.ScreenUpdating = True
End Sub
I think the issue you have is that you are not changing the range to reflect the text left to search. You seem to be resetting to the start of the range each time.
The following code may help
Option Explicit
Public Sub RoundNumbers(ByVal ipRange As wordRange)
Dim myRange As Word.Range
If ipRange Is Nothing Then
Set myRange = ActiveDocument.StoryRanges(wdMainTextStory)
Else
Set myRange = ipRange.Duplicate
End If
Dim myPreservedRangeEnd As Long
myPreservedRangeEnd = myRange.End
Dim myFindPattern As String
myFindPattern = "[0-9]{1,}.[0-9]{1,}"
Dim myDecplace As Long
myDecplace = 3
Do
Set myRange = FindWildCardPattern(myRange, myFindPattern)
If myRange Is Nothing Then
Exit Do
End If
myRange.Text = FormatNumber(Round(CDbl(myRange.Text) + 0.000001, myDecplace), myDecplace, vbTrue)
' At this point myRange is the newly inserted text so to search
' the remainder of the text in the selection we need to move
' the start to after the current range and replace the end of the
' current range with the preserved end of the selection range
myRange.Start = myRange.End + 1
myRange.End = myPreservedRangeEnd
Loop
End Sub
Public Function FindWildCardPattern(ByVal ipRange As Range, ipFindPattern As String) As Range
If ipRange Is Nothing Then
Set ipRange = ActiveDocument.StoryRanges(wdMainTextStory)
End If
With ipRange
With .Find
.Text = ipFindPattern
.Forward = True
.MatchWildcards = True
.Execute
End With
If .Find.Found Then
Set FindWildCardPattern = .Duplicate
Else
Set FindWildCardPattern = Nothing
End If
End With
End Function
The essential problem in your code is that you have the loop in the wrong place, and you set WordRng back to the beginning with each loop. Absent any example text I used the text in your question to test. The following should work:
Sub RoundNumbers()
Dim WorkRng As Range: Set WorkRng = Selection.Range
Dim FindPattern As String: FindPattern = "([0-9]){1,}.[0-9]{1,}"
Dim FoundVal As String
Dim decplace As Integer
decplace = 3
With WorkRng
With .Find
.ClearFormatting
.Text = FindPattern
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
End With
Do While .Find.Execute
.Text = FormatNumber(Round(CDbl(.Text) + 0.000001, decplace), decplace, vbTrue)
Loop
End With
End Sub
I have a word document that has SQL code in it. I have great code that finds specific terms I didn't realize that I had so many different terms in my document. I stopped at over 140 terms. I tried to add regex but I did it wrong. Would it be possible to have it find the start word SELECT and the end word FROM and change the background color to wdBrightGreen for everything between those terms?
I did not put all the fields in I stopped at three. Is there a way to put the ArrFind to search the whole document instead of specific terms?
Dim ArrFnd As Variant
Dim i As Long
Dim x As Long
Dim y As Long
Dim regexObject As RegExp
Dim strField1(1 To 200) As String
Dim strField2(1 To 20) As String
strField1(1) = "SELECT DISTINCT policy_id"
strField1(2) = "SELECT DISTINCT policy_id"
strField1(3) = "SELECT DISTINCT P.policy_id"
For x = 1 To 150
ArrFnd = Array(strField1(x))
With ActiveDocument
For i = 0 To UBound(ArrFnd)
With .Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchCase = True
.MatchWholeWord = True
.Text = ArrFnd(i)
With regexObject
.Pattern = "[SELECT-FROM]"
End With
.Replacement.Text = ""
End With
Do While .Find.Execute
.Font.Shading.BackgroundPatternColorIndex = wdBrightGreen
.Collapse wdCollapseEnd
Loop
End With
Next i
End With
Next x
One wonders why you persist with the approach taken in your code, in light of: MS Word VBA Can't select specific words to change background color
Regardless, for what you are now describing, all you need is:
Sub HiliteWords()
Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "SELECT[!^13^l]#FROM"
.Replacement.Text = ""
End With
Do While .Find.Execute
.Start = .Words.First.End
.End = .Words.Last.Start - 1
.Font.Shading.BackgroundPatternColorIndex = wdBrightGreen
.Collapse wdCollapseEnd
Loop
End With
Application.ScreenUpdating = True
End Sub
If you also don't want DISTINCT shaded, insert:
If .Words.First.Text = "DISTINCT " Then .Start = .Words.First.End
before:
.End = .Words.Last.Start - 1
I cannot figure out why the code is not working. I want to find within a cell (XXX) regex match and highlight each occurrence. Tried a lot of things.
rangeFinding.SetRange Start:=rngStart.Start, End:=rngEnd.End
rangeFinding.Select
With Selection
.Tables(1).cell(2, 2).Range.Text = y
End With
Dim qq As Range
Set qq = GetCellTextRange(Selection.Tables(1).cell(2, 2))
Options.DefaultHighlightColorIndex = wdYellow
With qq.Find
.ClearFormatting
.Text = "(\(*\))"
.MatchWildcards = True
.Replacement.ClearFormatting
.Replacement.Text = "\1"
.Replacement.Highlight = True
.Replacement.Font.Color = vbRed
.Execute Replace:=wdReplaceAll
End With
qq.HighlightColorIndex = wdYellow
End If
Procedure that is called:
Function GetCellTextRange(cl As cell) As Range ' ' GetCellTextRange
returns the content range of a table cell '
Dim rng As Range
Set rng = cl.Range
rng.MoveEnd wdCharacter, -1
Set GetCellTextRange = rng
End Function
Changing font color and/or bold does work, however highlighting not. Even when I replace \1 with YYY it works. But highlighting does not get applied, either when I use .Replacement.Highlight or alternative qq.HighlightColorIndex = wdYellow.
BTW: does the Find (and its Execute) function adapt qq range? That is what I noticed, but the docs do not tell it clearly.
I ran into the same problem with wildcard search/replace in VBA code not highlighting text. I was able to work around it by explicitly setting the default highlight colour before running the wildcard search/replace, and then setting it back to whatever it was afterward, e.g.:
currentHighlightColour = Options.DefaultHighlightColorIndex
Options.DefaultHighlightColorIndex = wdYellow
With ActiveDocument.Range.Find
.ClearFormatting
.replacement.ClearFormatting
.Text = "(\{\{placeholder: *\}\})"
.Format = True
.replacement.Text = "\1"
.Forward = True
.matchWildcards = True
.matchCase = False
.replacement.Highlight = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
Options.DefaultHighlightColorIndex = currentHighlightColour
Seems like a bug in Word, but at least there's a workaround.
I have a piece of VBA code that uses Find to find all the acronyms in a document. It does this by searching for all words consisting of capital letters that are 2 or more characters long using...
<[A-Z]{2,}>
The problem with this is it doesn't pick up all the acronyms, such as CoP, W3C, DVDs and CD-ROM. It picks up hyphenated acronyms in two parts which are not ideal but allowable as the list is checked by a user. I can also pick up words that end with an "s" or other characters by not searching to the end of the word using...
<[A-Z]{2,}
But this doesn't count any non-upper case character as part of the word it finds.
Is there an expression that would allow me to search for words with two or more capital letters in any location and find the whole word?
I don't think it's possible to 'search for words with two or more capital letters in any location and find the whole word' except in combination with macro code. Since you're using a macro, anyway, here's an approach that worked for me using the following sample text
CoP, this That and AnoTher thing W3C, DVDs and CD-ROM
and this wildcard combination (note that the list separator in my Windows configuration is ;, for other regions a , may be required).
<[A-Z][0-9A-Z\-a-z]{1;10}>
The following function checks whether the second or any later letter in the "found" range is capitalized and returns a boolean to the calling procedure. It loops through the characters in the given Range, checking the ASCII value. As soon as one is found, the loop exits.
Function ContainsMoreThanOneUpperCase(rng As Word.Range) As Boolean
Dim nrChars As Long, i As Long
Dim char As String
Dim HasUpperCase
HasUpperCase = False
nrChars = rng.Characters.Count
For i = 2 To nrChars
char = rng.Characters(i).text
If Asc(char) >= 65 And Asc(char) <= 90 Then
'It's an uppercase letter
HasUpperCase = True
Exit For
End If
Next
ContainsMoreThanOneUpperCase = HasUpperCase
End Function
An example for using it:
Sub FindAcronyms()
Dim rngFind As Word.Range
Dim bFound As Boolean
Set rngFind = ActiveDocument.content
With rngFind.Find
.text = "<[A-Z][0-9A-Z\-a-z]{1;10}>"
.MatchWildcards = True
.Forward = True
.wrap = wdFindStop
bFound = .Execute
Do While bFound
If bFound And ContainsMoreThanOneUpperCase(rngFind) Then
Debug.Print rngFind.text
rngFind.HighlightColorIndex = wdBrightGreen
End If
rngFind.Collapse wdCollapseEnd
bFound = .Execute
Loop
End With
End Sub
You can't do this in one pass of Find/Replace. You also have to make some allowances for what the Word application considers a Word and then where the acronym is located in the sentence or paragraph.
The following code should provide an idea for how you might do it with a combination of Wildcard searching and then additional VBA string manipulation.
It is setup to deal with words that start with capital letters, you will need to carry it further and add code and wildcard search criteria for words that start with lowercase letters if you expect to have any of those.
Sub FindAcronynms()
Dim rng As word.Range
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "<[A-Z]{1,}[a-z][A-Z]>"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "[A-Z]{1,5}[0-9][A-Z]{1,5}"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
Set rng = ActiveDocument.Content
With rng.Find
.ClearFormatting
.Format = False
.Forward = True
.MatchWildcards = True
.Text = "<[A-Z]{2,}>"
.Wrap = wdFindStop
.Execute
Do While .found
MoveEndOfString rng
rng.HighlightColorIndex = wdTeal
rng.Collapse wdCollapseEnd
.Execute
Loop
End With
MsgBox "Action Complete", vbExclamation, "Custom Find"
End Sub
Private Function MoveEndOfString(ByRef rng As word.Range)
rng.MoveEnd wdCharacter, 1
Select Case Asc(rng.Characters.Last)
Case Is <= 32
rng.MoveEnd wdCharacter, -1
Case 45
rng.MoveEnd wdCharacter, 1
rng.MoveEnd wdWord, 1
If Asc(rng.Characters.Last) = 32 Then
'required because move above includes
'trailing space
rng.MoveEnd wdCharacter, -1
End If
End Select
End Function
You might use something like:
Sub Demo()
Application.ScreenUpdating = False
Options.DefaultHighlightColorIndex = wdPink
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Highlight = True
.Forward = True
.Format = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Text = "<[A-Z][A-Z0-9/-]{1,}"
.Replacement.Text = "^&"
.Execute Replace:=wdReplaceAll
.Text = "<[A-Z][A-Za-z0-9/-]#[A-Z]"
.Replacement.Text = "^&"
.Execute Replace:=wdReplaceAll
End With
End With
Application.ScreenUpdating = True
End Sub
So, I'm just beginning to understand Regular Expressions and I've found the learning curve fairly steep. However, stackoverflow has been immensely helpful in the process of my experimenting. There is a particular word macro that I would like to write but I have not figured out a way to do it. I would like to be able to find two words within 10 or so words of each other in a document and then italicize those words, if the words are more than 10 words apart or are in a different order I would like the macro not to italicize those words.
I have been using the following regular expression:
\bPanama\W+(?:\w+\W+){0,10}?Canal\b
However it only lets me manipulate the entire string as a whole including random words in between. Also the .Replace function only lets me replace that string with a different string not change formatting styles.
Does any more experienced person have an idea as to how to make this work? Is it even possible to do?
EDIT: Here is what I have so far. There are two problems I am having. First I don't know how to only select the words "Panama" and "Canal" from within a matched Regular Expression and replace only those words (and not the intermediate words). Second, I just don't know how to replace a Regexp that is matched with a different format, only a different string of text - probably just as a result of a lack of familiarity with word macros.
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Set re = New regExp
re.Pattern = "\bPanama\W+(?:\w+\W+){0,10}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
Set rng = para.Range
rng.MoveEnd unit:=wdCharacter, Count:=-1
Text$ = rng.Text + "Modified"
rng.Text = re.Replace(rng.Text, Text$)
Next para
End Sub
Ok, thanks to help from Tim Williams below I got the following solution together, it's more than a little clumsy in some respects and it is by no means pure regexp but it does get the job done. If anyone has a better solution or idea about how to go about this I'd be fascinated to hear it though. Again, my brute forcing the changes with the search and replace feature is a little embarrassingly crude but at least it works...
Sub RegText()
Dim re As regExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New regExp
re.pattern = "\bPanama\W+(?:\w+\W+){0,13}?Canal\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, m.Length
rng.Font.ColorIndex = wdOrange
Next m
End If
Next para
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Panama"
.Replacement.Text = "Panama"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = True
With Selection.Find
.Text = "Canal"
.Replacement.Text = "Canal"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.ColorIndex = wdOrange
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.ColorIndex = wdBlack
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
I'm a long way off being a decent Word programmer, but this might get you started.
EDIT: updated to include a parameterized version.
Sub Tester()
HighlightIfClose ActiveDocument, "panama", "canal", wdBrightGreen
HighlightIfClose ActiveDocument, "red", "socks", wdRed
End Sub
Sub HighlightIfClose(doc As Document, word1 As String, _
word2 As String, clrIndex As WdColorIndex)
Dim re As RegExp
Dim para As Paragraph
Dim rng As Range
Dim txt As String
Dim allmatches As MatchCollection, m As match
Set re = New RegExp
re.Pattern = "\b" & word1 & "\W+(?:\w+\W+){0,10}?" _
& word2 & "\b"
re.IgnoreCase = True
re.Global = True
For Each para In ActiveDocument.Paragraphs
txt = para.Range.Text
'any match?
If re.Test(txt) Then
'get all matches
Set allmatches = re.Execute(txt)
'look at each match and hilight corresponding range
For Each m In allmatches
Debug.Print m.Value, m.FirstIndex, m.Length
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex
rng.MoveEnd wdCharacter, Len(word1)
rng.HighlightColorIndex = clrIndex
Set rng = para.Range
rng.Collapse wdCollapseStart
rng.MoveStart wdCharacter, m.FirstIndex + (m.Length - Len(word2))
rng.MoveEnd wdCharacter, Len(word2)
rng.HighlightColorIndex = clrIndex
Next m
End If
Next para
End Sub
If you're after just doing each 2 words at a time, this worked for me, following your practice lines.
foo([a-zA-Z0-9]+? ){0,10}bar
Explanation:
will grab word 1 (foo), then match anything that is a word of alphanumeric characters ([a-zA-Z0-9]+?) followed by a space (), 10 times ({0,10}), then word 2 (bar).
This doesn't include full stops (didn't know if you wanted them), but if you want to just add . after 0-9 in the regex.
So your (pseudocode) syntax will be similar to:
$matches = preg_match_all(); // Your function to get regex matches in an array
foreach (those matches) {
replace(KEY_WORD, <i>KEY_WORD</i>);
}
Hopefully it helps. Testing below, highlighted what it matched.
Worked:
The foo this that bar blah
The foo economic order war bar
Didn't Work
The foo economic order. war bar
The global foo order has been around for several centuries, over this period of time people have evolved different and intricate trade relationships dealing with situations such as agriculture and bar