Merge Three Regexes into One (or Two) - regex

I would like to merge my three regexes which clean text (empty lines, leading and trailing spaces etc.) into, if possible, one regex, or if it is not possible - into two.
My first regex is [ \t]+. It does this sort of cleaning.
My second regex is ^(?:[\t ]*(?:\r?\n|\r))+ Not image included since it won't catch anything if the previous regex has not run.
The third regex is ^[\s\xA0]+|[\s\xA0]+$. It does this sort of cleaning.
EDIT: I have forgotten to mention that in each case I replace match with nothing "".
EDIT 2: I use the following code in Word:
With selection
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.MultiLine = True
' clean selection
RegEx.Pattern = "[ \t]+"
.Text = RegEx.Replace(.Text, " ")
RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r))+"
.Text = RegEx.Replace(.Text, "")
' the following is from http://stackoverflow.com/a/24049145/2657875
RegEx.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
.Text = RegEx.Replace(.Text, "")
End With

The last regexps can be merged as
RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r)?)*|[ \t]+$"
I do not think there can be a chance to merge all 3 in VBA since you are using two different replacement patterns.

If i am not wrong, you want all your lines/spaces/tabs/white lines to be matched and removed, so you could merge the input strings. Well, that's easy and can be done if you do use the following regex in your replace program/script/command:
/([\s\t]{0,50}\r?\n)+|\s+/s
The regex should work well on windows as well as linux based files.

Not pro but I use multiple regex one after another. If you are not familiar with below code than you should try.
Set regEx_ = new regExp
With regEx_
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "Pattern 1"
TextLine = regEx_.replace(TextLine, "")
.Pattern = "Pattern 2"
TextLine = regEx_.replace(TextLine, "")
'and so on
End With

Related

Excel Regex Add 2 spaces instead of one

I'm using the below function in Excel to split the caps of some data. How can I adapt it to add 2 spaces between words e.g Mike Jones rather than just one as it does now. Simple answer I'm sure but RegEx baffles me at the best of times.
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
very very simple: add an extra space between the 2 regex groups $1 and $2
SplitCaps = .Replace(strIn, "$1 $2")
I think all you need is
([a-zA-Z]*\s*[a-zA-Z]*)*
Have you tried add '\s'? This should be a comment but I cant comment as of now.
try:
You can add extra white space using \s:
SplitCaps = .Replace(strIn, "$1\s\s$2")

Create clean URL from text in Excel

I want to create a clean URL from a text such as this one:
Alpha Tests' Purchase of Berta Global Associates (C)
The URL should look like this:
alpha-tests-purchase-of-berta-global-associates-c
Currently I use this formula in Excel:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");" ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))
However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.
Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?
Thank you.
I can suggest the following Function that you can put into a VBA module and use a normal formula:
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"
With regEx
.Global = True
.Pattern = strPattern
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function
The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace.
UPDATE:
After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"
With regEx
.Global = True
.Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With
NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"
With regEx
.Global = True
.Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
.Global = True
.Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function

Replace matched pattern with different font

I am using Outlook 2010, and I am trying to write a macro to replace the font of text with a different one, if it matches a pattern.
The logic I am trying to apply is simple - in the user selected text, check for a pattern, and on match, change the font for the matched text.
So far I have been able to split the text and apply/check regex, but the replacement is something that I am not clear on how to do.
Dim objOL As Application
Dim objDoc As Object
Dim objSel As Object
Dim regEx As RegExp
Dim matches As MatchCollection
Dim m As Match
Dim lines As Variant
Dim ms As String
Set objOL = Application
Set objDoc = objOL.ActiveInspector.WordEditor
Set objSel = objDoc.Windows(1).Selection
lines = Split(objSel, Chr(13))
For i = 0 To UBound(lines) Step 1
Set regEx = New RegExp
With regEx
.Pattern = "\[(ok|edit|error)\](\[.*\])?" ' <-- this is just one regex, I want to be able to check more regexes
.Global = True
End With
If regEx.Test(lines(i)) Then
Set matches = regEx.Execute(lines(i))
For Each m In matches
ms = m.SubMatches(1)
' ms.Font.Italic = True
' <-- here is where I am not sure how to replace! :( -->
Next
End If
Next i
P.S there seems to be text-search (objSel.Find.Text)and replace (objSel.Find.Replacement.Text) methods in Selection object, but not pattern-search ! (or I am missing it)
--EDIT--
Adding a sample text
user#host> show some data
..<few lines of data>.. <-- these lines as-is (but monospaced)
[ok][2014-11-26 11:05:02]
user#host> edit some other data
[edit data]
user#host(data)% some other command
I want to convert the whole block to a monospaced font (like Courier New, or Consolas)
And change the part that begins with something#somewhere.. and till > or % to dimmer color,
(i.e in this example user#host> and user#host(data)% dimmer/grey)
The rest in that line to bold (show some data et al)
And, all the bracketed text followed by time-stamps (or without timestamps) similar to 2. (i.e, dim/grey)
This is getting closer to being done. The framework is here to make all sorts of changes now. Just need to get some of the regex patterns down to make the changes.
Sub FormatSelection()
Dim objMailItem As Outlook.MailItem
Dim objInspector As Outlook.Inspector: Set objInspector = Application.ActiveInspector
Dim objHtmlEditor As Object
Dim objWord As Object
Dim Range As Word.Selection
Dim objSavedSelection As Word.Selection
Dim objFoundText As Object
' Verify a mail object is in focus.
If objInspector.CurrentItem.Class = olMail Then
' Get the mail object.
Set objMailItem = objInspector.CurrentItem
If objInspector.EditorType = olEditorWord Then
' We are using a Word editor. Get the selected text.
Set objHtmlEditor = objMailItem.GetInspector.WordEditor
Set objWord = objHtmlEditor.Application
Set Range = objWord.Selection
Debug.Print Range.Range
' Set defaults for the selection
With Range.Font
.Name = "Courier"
.ColorIndex = wdAuto
End With
' Stylize the bracketed text
Call FormatTextWithRegex(Range, 2, "\[(.+?)\]")
' Prompt style text.
Call FormatTextWithRegex(Range, 2, "(\w+?#.+?)(?=[\>\%])")
' Text following the prompt.
Call FormatTextWithRegex(Range, 3, "(\w+?#.+?[\>\%])(.+)")
End If
End If
Set objInspector = Nothing
Set Range = Nothing
Set objHtmlEditor = Nothing
Set objMailItem = Nothing
End Sub
Private Sub FormatTextWithRegex(ByRef pRange As Word.Selection, pActionIndex As Integer, pPattern As String)
' This routine will perform a regex replacement on the text in pRange using pPattern
' on text based on the pactionindex passed.
Const intLightColourIndex = 15
Dim objRegex As RegExp: Set objRegex = New RegExp
Dim objSingleMatch As Object
Dim objMatches As Object
' Configure Regex object.
With objRegex
.IgnoreCase = True
.MultiLine = False
.Pattern = pPattern ' Example "\[(ok|edit|error)\](\[.+?\])?"
.Global = True
End With
' Locate all matches if any.
Set objMatches = objRegex.Execute(pRange.Text)
' Find
If (objMatches.Count > 0) Then
Debug.Print objMatches.Count & " Match(es) Found"
For Each objSingleMatch In objMatches
' Locate the text associated to this match in the selection so we can replace it.
Debug.Print "Match Found: '" & objSingleMatch & "'"
With pRange.Find
'.ClearFormatting
.Text = objSingleMatch.Value
.ClearFormatting
Select Case pActionIndex
Case 1 ' Italisize text
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = True
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
Case 2 ' Dim the colour
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = intLightColourIndex
.Execute Replace:=wdReplaceAll
Case 3 ' Bold that text!
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = True
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
End Select
End With
Next
Else
Debug.Print "No matches found for pattern: " & pPattern
End If
Set objRegex = Nothing
Set objSingleMatch = Nothing
Set objMatches = Nothing
End Sub
So we take what the user has selected and execute the macro. I have my Outlook configured with Word for the editor so that is tested for. Take the selected text and run the regex query against the text saving the matches.
The issue you had is what to do with the match once you found it. In my case since we have the actual text that matched we can run that through a find and replace using the selection once again. Replacing the text with itself instead styled as directed.
Caveats
My testing text was the following:
asdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdfasdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdf
I had to change your regex in your sample to be less greedy since it was matching both [ok][Test] sections. I don't know what kind of text you are working with so my logic might not apply to your situation. Test with caution.
You also had a comment that you needed to test multiple regexes... regexies.... I don't know what the plural is. Wouldn't be hard to create another function that calls this one for several patterns. Assuming this logic works repeating it should not be a big deal. I would like to make this work for you so if something is wrong let me know.
Code Update
I have changed the code so that the regex replacement is in a sub. So what the code does right now is change the selected text to courier and italisize text based on a regex. Now with how it is set up you can use the sub routine FormatTextWithRegex to make changes. Just need to update the pattern and action index which will perform the different styles. Will be updating this again soon with more information. Right now all that exists is the structure that I think you need.
Having issues with the bolding still but you can see the grey part is working correctly. Also the since this relies on highlighting the multiple calls to the function are causing an issue. Just not sure what it is.

Find and replace from given word to right parenthesis

I have just taken up the VBA route to automate a few day today tasks so pls excuse if I sound very naive
I'm trying to open a word document & then searching for a expression to highlight(Bold) it,however Im getting error "User defined type not defined"
I'm able to open the word document but unable to perform the pattern search.I have gathered bits & peices of code from internet, however its not working
I'm using Office 2013 & have added the Microsoft VBscript Reg Ex 5.5 in references.
The pattern Im searching is starting from "Dear" till ) is encountered.
Cheers #GoingMad#
Sub Pattern_Replace()
Dim regEx, Match, Matches
Dim rngRange As Range
Dim pathh As String, i As Integer
pathh = "D:\Docs\Macro.docx"
Dim pathhi As String
Dim from_text As String, to_text As String
Dim WA As Object, WD As Object
Set WA = CreateObject("Word.Application")
WA.Documents.Open (pathh)
WA.Visible = True
Set regEx = New RegExp
regEx.Pattern = "Dear[^0-9<>]+)"
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value)).Bold = True
Next
End Sub
You need to escape the bracket ")" within the regex, using a back-slash:
regex.Pattern = "Dear[^0-9<>]+\)"
This is because it has a particular meaning within a regex expression.
I would personally also split the reference to the Word-Range across a few lines:
Set rngRange = ActiveDocument.Range
rngRange.Expand Unit:=wdStory
Set Matches = regex.Execute(rngRange.Text)
although this isn't necessary.
Consider the following text
Dear aunt sally ) I have gone to school.
Your regex pattern would be "Dear[^)]+"
Find the word Dear
Match Any character that is not ")"
Repeat
Refiddle here
This one will include the parenthesis. Dear[\w\s]+\)
Find the word Dear
Match Any Character or whitespace
Repeat as needed
Until a right parenthesis is found
You don't need regex for this - a wildcard Find/Replace in Word will do the job far more efficiently:
With WA.ActiveDocument.Range.Find
.ClearFormatting
.Text = "Dear[!\)]#\)"
.Replacement.ClearFormatting
.Replacement.Font.Bold = True
.Replacement.Text = "^&"
.Format = True
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With

Excluding delimiters with Search in MS Word

Say I have the following string:
"Hello how are you."
Since MS Word allows for regular expressions, I can use "*" to find the complete string. But what if I want to exclude the delimiters (the quotes)? I'm afraid that MS Word doesn't support either of the two methods explained here. My question is: would there be any way to do this in one search query?
Thanks in advance.
There are different ways to achieve what you want. Here is one way to find text in VBA Word without the dilimiters using Regex. Let's say you have the following text in Word Document (do not copy and paste it from here as the the website distorts the Double quotes. See the screenshot)
This is a sample
"This is another Sample"
"Wake me up before you go go"
"War of the worlds"
The code to return text using Regex between two quotes is as follows
Sub FindText()
Dim regEx, Match, Matches
Set regEx = New RegExp
regEx.Pattern = "([^“]*)(?=\”)"
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
Debug.Print Match.Value
Next
End Sub
and if you want to say find "Wake me up before you go go" without quotes then you can use this as well
Sub FindText()
Dim regEx, Match, Matches
Dim searchText As String
searchText = "Wake me up before you go go"
Set regEx = New RegExp
regEx.Pattern = "([^“]*)(?=\”)"
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
If Trim(Match.Value) = (searchText) Then
Debug.Print "Found"
End If
Next
End Sub
NOTE: The website distorts the actual double quote so I am posting screenshots.
FOLLOWUP
For the sample file that you posted, use this code
Sub FindText()
Dim regEx, Match, Matches
Set regEx = New RegExp
regEx.Pattern = """([^""]*)"""
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
Debug.Print Match.SubMatches(0)
Next
End Sub
Sample File can be downloaded from here. Please note that this link will be active for 7 days.
Sample File
HTH
Sid
You are wrong. Word does support some wildcards, ? for a single character and * for a series of characters.
This is not a regular expression
means no lookbehind and no lookahead
While there will never be everything in Ms-Word that you want, e.g. like this one where you want to find something else, but want to select only a part of it, there are always macros which you can program to accomplish your task.
Add the following VBA code to your document. You can add a custom button on the toolbar to call it.
Sub FindSpecial()
FindSpecialA
End Sub
Private Sub FindSpecialA(Optional text As String)
Dim ToFind As String
ToFind = InputBox("Enter the text you want to find in double-quotes (without double-quotes):" & vbCrLf & vbCrLf & "(Enter * to match anything within double-quotes)", "Find", text)
If ToFind = "" Then Exit Sub
Selection.Find.ClearFormatting
With Selection.Find
.text = """" & ToFind & """"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
If ToFind = "*" Then
.text = "[“""]*[”""]"
.MatchWildcards = True
End If
End With
Selection.Find.Execute
If Selection.Find.Found Then
Selection.MoveStart unit:=wdCharacter, Count:=1
Selection.MoveEnd unit:=wdCharacter, Count:=-1
FindSpecialA ToFind
Else
MsgBox "Not found!"
End If
End Sub
EDIT:
Updated the code to handle wildcard * matches.
Some versions of MS Word support regex-style groups with their "search with wildcards" option, meaning that if you can create a search expression between two quotes -- the one that works for me is "?#" -- you can change it to "(?#)" and enter \1 for the replace text. This will replace the text that was found with just the text that matches the expression between the parentheses, getting rid of your quote marks. (MS Word's ?# is equivalent to .* (non-greedy) in common regex.)
This works for me in Word 2008 for Mac, but I don't have a guide to which versions of Office support this syntax.
Beware! In this search form, Word does not equate the straight quotes on your keyboard with the curly quotes it inserts in order to look pretty. You will need to either turn off "smart quotes" for this document, or construct your search phrase by cutting and pasting the opening and closing quote characters from your document.
I have had luck using color (Format < Font in the Search dialog box while you are clicked into Find What) to solve problems like this. Execute search all content with delimiters (“*” with wildcards checked for this example) and replace using a non-black color like blue. Search and replace delimiter (in this case quote marks) color from blue to black. Perform changes to content in blue. Select all and change to black. If this comes up often, I would suggest macros on a toolbar for the step one (blue, take blue off delimiter) and step two (change all to black).