Find word with RegExp and bold - regex

I've a word document where I want to find all the words as have the following layout: ABC-12:123456 DEF. Where this is found in the document the word should be selected and put in bold. (Later i'll add a hyperlink instead of bold). I have successfully found the word and put it in a MatchCollection just to try RegExp. It looks like:
Sub searchDocument()
Set matchPattern = New RegExp
matchPattern.Pattern = "ABC-\d{2}:\d{6} DEF"
matchPattern.Global = True
Dim matchPatternWords As MatchCollection
Set matchPatternWords = matchPattern.Execute(ActiveDocument.Range)
For Each matchPatternWord In matchPatternWords
MsgBox (matchPatternWord)
Next matchPatternWord
End Sub

You need to go from the regexp match to the range object representing the match.
matchRange = ActiveDocument.Range
(matchPatternWord.FirstIndex, matchPatternWord.FirstIndex+matchPatternWord.Length)
would be the obvious invocation.
However this post indicates that there might be issues with this approach, because formating can mess up the character count. It's from 2010 though so the issue might be resolved in a better way now.
If the above doesn't work, or if you don't trust it you can do;
matchRange = ActiveDocument.Range.Find(FindText:=matchPatternWord.Value)
The latter needs a bit more handeling if multiple occurences of the same word is a possibility.
Once you have the range it's straight forward.
matchRange.Bold = True

Related

Changing formulas on the fly with VBA RegEx

i'm trying to change formulas in excel, i need to change the row number of the formulas.
I'm trying do use replace regex to do this. I use an loop to iterate through the rows of the excel and need to change the formula for the row that is iterating at the time. Here is an exemple of the code:
For i = 2 To rows_aux
DoEvents
Formula_string= "=IFS(N19='Z001';'xxxxxx';N19='Z007';'xxxxxx';0=0;'xxxxxxx')"
Formula_string_new = regEx.Replace(Formula_string, "$1" & i)
wb.Cells(i, 33) = ""
wb.Cells(i, 33).Formula = Formula_string_new
.
.
.
Next i
I need to replace rows references but not the ones in quotes or double quotes. Example:
If i = 2 i want the new string to be this:
"=IFS(N2='Z001';'xxxxxx';N2='Z007';'xxxxxx';0=0;'xxxxxxx')"
I'm trying to use this regex:
([a-zA-Z]+)(\d+)
But its changing everything in quotes too. Like this:
If i = 2:
"=IFS(N2='Z2';'xxxxxx';N2='Z2';'xxxxxx';0=0;'xxxxxxx')"
If anyone can help me i will be very grateful!
Thanks in advance.
As others have written, there are probably better ways to write this code. But for a regex that will capture just the Column letter in capturing group #1, try:
\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?
Note that is will NOT include the $ absolute addressing token, but could be altered if that were necessary.
Note that you can avoid the loop completely with:
Formula_string = "=IFS(N19=""Z001"",""xxxxxx"",N$19=""Z007"",""xxxxxx"",0=0,""xxxxxxx"")"
Formula_string_new = regEx.Replace(Formula_string, "$1" & firstRow)
With Range(wb.Cells(firstRow, 33), wb.Cells(lastRow, 33))
.Clear
.Formula = Formula_string_new
End With
When we write a formula to a range like this, the references will automatically adjust the way you were doing in your loop.
Depending on unstated factors, you may want to use the FormulaLocal property vice the Formula property.
Edit:
To make this a little more robust, in case there happens to be, within the quote marks, a string that exactly mimics a valid address, you can try checking to be certain that a quote (single or double) neither precedes nor follows the target.
Pattern: ([^"'])\$?\b(XF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(?!['"])
Replace: "$1$2" & i
However, this is not "bulletproof" as various combinations of included data might match. If it is a problem, let me know and I'll come up with something more robust.
If you can identify some unique features like in the example preceding bracket ( or colon ; and trailing equal = then this might work
Sub test()
Dim s As String, sNew As String, i As Long
Dim Regex As Object
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([(;][a-zA-Z]{1,3})(\d+)="
End With
i = 1
s = "=IFS(NANA19='Z001';'xxxxxx';NA19='Z007';'xxxxxx';0=0;'xxxxxxx')"
sNew = Regex.Replace(s, "$1" & i & "=")
Debug.Print s & vbCr & sNew
End Sub

Why does Find/Replace zRngResult.Find work fine, but RegEx myRegExp.Execute(zRngResult) mess up the range.Start?

I wish to select and add comments after certain words, e.g. “not”, “never”, “don’t” in sentences in a Word document with VBA. The Find/Replace with wildcards works fine, but “Use wildcards” cannot be selected with “Match case”. The RegEx can “IgnoreCase=True”, but the selection of the word is not reliable when there are more than one comments in a sentence. The Range.start seems to be getting modified in a way that I cannot understand.
A similar question was asked in June 2010. https://social.msdn.microsoft.com/Forums/office/en-US/f73ca32d-0af9-47cf-81fe-ce93b13ebc4d/regex-selecting-a-match-within-the-document?forum=worddev
Is there a new/different way of solving this problem?
Any suggestion will be appreciated.
The code using RegEx follows:
Function zRegExCommentor(zPhrase As String, tComment As String) As Long
Dim sTheseSentences As Sentences
Dim rThisSentenceToSearch As Word.Range, rThisSentenceResult As Word.Range
Dim myRegExp As RegExp
Dim myMatches As MatchCollection
Options.CommentsColor = wdByAuthor
Set myRegExp = New RegExp
With myRegExp
.IgnoreCase = True
.Global = False
.Pattern = zPhrase
End With
Set sTheseSentences = ActiveDocument.Sentences
For Each rThisSentenceToSearch In sTheseSentences
Set rThisSentenceResult = rThisSentenceToSearch.Duplicate
rThisSentenceResult.Select
Do
DoEvents
Set myMatches = myRegExp.Execute(rThisSentenceResult)
If myMatches.Count > 0 Then
rThisSentenceResult.Start = rThisSentenceResult.Start + myMatches(0).FirstIndex
rThisSentenceResult.End = rThisSentenceResult.Start + myMatches(0).Length
rThisSentenceResult.Select
Selection.Comments.Add Range:=Selection.Range
Selection.TypeText Text:=tComment & "{" & zPhrase & "}"
rThisSentenceResult.Start = rThisSentenceResult.Start + 1 'so as not to find the same phrase again and again
rThisSentenceResult.End = rThisSentenceToSearch.End
rThisSentenceResult.Select
End If 'If myMatches.Count > 0 Then
Loop While myMatches.Count > 0
Next 'For Each rThisSentenceToSearch In sTheseSentences
End Function
Relying on Range.Start or Range.End for position in a Word document is not reliable due to how Word stores non-printing information in the text flow. For some kinds of things you can work around it using Range.TextRetrievalMode, but the non-printing characters inserted by Comments aren't affected by these settings.
I must admit I don't understand why Word's built-in Find with wildcards won't work for you - no case matching shouldn't be a problem. For instance, based on the example: "Never has there been, never, NEVER, a total drought.":
FindText:="[n,N][e,E][v,V][e,E][r,R]"
Will find all instances of n-e-v-e-r regardless of the capitalization. The brackets let you define a range of values, in this case the combination of lower and upper case for each letter in the search term.
The workarounds described in my MSDN post you link to are pretty much all you can if you insist on RegEx:
Using the Office Open XML (or possibly Word 2003 XML) file format will let you use RegEx and standard XML processing tools to find the information, add comment "tags" into the Word XML, close it all up... And when the user sees the document it will all be there.
If you need to be doing this in the Word UI a slightly different approach should work (assuming you're targeting Word 2003 or later): Work through the document on a range-by-range basis (by paragraph, perhaps). Read the XML representation of the text into memory using the Range.WordOpenXML property, perform the RegEx search, add comments as WordOpenXML, then write the WordOpenXML back into the document using the InserXml method, replacing the original range (paragraph). Since you'd be working with the Paragraph object Range.Start won't be a factor.

Writing a word macro to organize chat logs

I need some help writing a word macro to organize some chat logs. What I want is to eliminate repeated consecutive occurrences of names, regardless of timestamp. Besides this, each person will be using their own formatting style (font, font color, etc.). Edit: the raw logs have no formatting (i.e. specific fonts, font color ,etc.). I want the macro to automatically add a specific (already existent) word style to each user.
So, what I have is:
[12:40] Steve: this is an example text.
[12:41] Steve: this is another example text.
[12:41] Steve: this is yet another example text.
[12:45] Bob: some more text.
[12:46] Bob: even more text.
[12:47] Steve: yadda yadda yadda.
The expected output would be:
[12:40] Steve: *style1*this is an example text.
this is another example text.
this is yet another example text.*/style1*
[12:45] Bob: *style2*some more text.
even more text.*/style2*
[12:47] Steve: *style1*yadda yadda yadda.*style1*
As of now, unfortunately, I know next to nothing of VBA for Applications. I was thinking of maybe searching for the names by a regex pattern and assigning them to a variable, comparing each match to the previous and, if they're equal, deleting the latter. The problem is I'm not fluent in VBA, so I don't know how to do what I want.
So far, all I've got is this:
Sub Organize()
Dim re As RegExp
Dim names As MatchCollection, name As Match
re.Pattern = "\[[0-9]{2}:[0-9]{2}\] [a-zA-Z]{1,20}:"
re.IgnoreCase = True
re.Global = True
Set names = re.Execute(ActiveDocument.Range)
For Each name In names
'This is where I get lost
Next name
End Sub
So, in the interest of solving this problem and me learning some VBA, could I get some help?
EDIT: the question has been edited to better reflect what I want the macro to do.
Assuming that each line in your log is a separate paragraph I would do it without Regex but with .Find object feature. The following code is working find for the sample data you provided.
Sub qTest()
Dim PAR As Paragraph
Dim PrevName As String
For Each PAR In ActiveDocument.Content.Paragraphs
PAR.Range.Select 'highlight current paragraph
'find name in paragraph
With Selection.Find
.ClearFormatting
.Text = "\]*\:"
.Execute
End With
If Selection.Text = PrevName Then
'extend region for the whole paragraph
'end delete it
ActiveDocument.Range(PAR.Range.Start, Selection.End + 1).Delete
Else
PrevName = Selection.Text
Debug.Print PrevName
End If
Next
End Sub

Regex Matching and Deleting/Replacing a string

So I am trying to parse through a file which has multiple "footers" (the file is an output that was designed for printing which my company wants to keep electronically stored...each footer is a new page and the new page is no longer needed as).
I am trying to look for and remove lines that look like:
1 of 2122 PRINTED 07/01/2013 04:46 Page : 1 of 11
2 of 2122 PRINTED 07/01/2013 04:46 Page: 2 of 11
3 of 2122 PRINTED 07/01/2013 04:46 Page: 3 of 11
and so on
I then want to replace the final line (which would read something like "2122 of 2122") with a "custom" footer.
I am using RegEx, but am very new to using it so how should my RegEx look in order to accomplish this? I plan on using the RegEx "count" function to find out when I've found the last line and then do a .replace on it.
I am using VB .NET, but can translate C# if required. How can I accomplish what I'm looking to do? Specifically I only care about matching/removing of a match so long as the # of matches > 1.
Here's one I created with RegExr:
/^(\d+\s+of\s+\d+)(?=\s+printed)/gim
It matches (number)(space)('of')(space)(number) at the beginning of a line, and only if it is followed by (space)('printed'), case insensitive. The /m flag turns ^ and $ into line-aware boundaries.
This is how I ended up doing it...
Private Function FixFooters(ByVal fileInput As String, Optional ByVal numberToLeaveAlone As Integer = 1) As String
Dim matchpattern As String = "^\d+\W+of\W+\d+\W+PRINTED.*$"
Dim myRegEx As New Regex(matchpattern, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim replacementstring As String = String.Empty
Dim matchCounter As Integer = myRegEx.Matches(fileInput).Count
If numberToLeaveAlone > matchCounter Then numberToLeaveAlone = matchCounter
Return myRegEx.Replace(fileInput, replacementstring, matchCounter - numberToLeaveAlone, 0)
End Function
I used myregextester.com to get the inital matchpattern. Since I wanted to leave the last footer alone (to manipulate it further later on) I created the numberToLeaveAlone variable to ensure we don't remove ALL of the variables. For the purposes of this program I made the default value 1, but that could be changed to zero (I only did it for readability in the calling code as I know I will ALWAYS want to leave one...but I do like to reuse code). It's fairly fast, I'm sure there are better ways out there, but this one made the most sense to me.

regular expressions and vba

Does anyone know how to extract matches as strings from a RegExp.Execute() function?
Let me show you what I've gotten to so far:
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
I want the object "myMatches" which is holding the matches, to be converted to a string. I know that there is only going to be one match per execution.
Does anyone know how to extract the matches from the object as Strings to be displayed lets say via a MsgBox?
Try this:
Dim sResult As String
'// Your expression code here...
sResult = myMatches.Item(0)
'// or
sResult = myMatches(0)
Msgbox("The matching text was: " & sResult)
The Execute method returns a match collection and you can use the item property to retrieve the text using an index.
As you stated you only ever have one match then the index is zero. If you have more than one match you can return the index of the match you require or loop over the entire collection.
This page has a lot of information on regex and seems to have what you want.
http://www.regular-expressions.info/vbscript.html