Find text and replace with hyperlink - regex

I am trying to replace text in the body with pattern ASA###### to ASA######(hyperlink)
I have code which works if there is only one pattern in the body.
But if I have many patterns like
ASA3422df
ASA2389ds
ASA1265sa
the entire body gets replaced to
ASAhuyi65
My code is here.
Dim strID As String
Dim Body As String
Dim objMail As Outlook.MailItem
Dim temp As String
Dim RegExpReplace As String
Dim RegX As Object
strID = MyMail.EntryID
Set objMail = Application.Session.GetItemFromID(strID)
Body = objMail.HTMLBody
Body = Body + "Test"
objMail.HTMLBody = Body
Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Pattern = "ASA[0-9][0-9][0-9][0-9][a-z][a-z]"
.Global = True
.IgnoreCase = Not MatchCase
End With
'RegExpReplace = RegX.Replace(Body, "http://www.code.com/" + RegX.Pattern + "/ABCD")
'if the replacement is longer than the search string, future .FirstIndexes will be off
Offset = 0
'Set matches = RegX.Execute(Body)
For Each m In RegX.Execute(Body)
RegExReplace = "" & m.Value & ""
Next
Set RegX = Nothing
objMail.HTMLBody = RegExReplace
objMail.Save
Set objMail = Nothing
End Sub

It looks like you were on the right track originally with that commented-out line. With the Replace method don't need to loop over matches (that's what the Global flag is for), and can use backreferences like $1, $2, etc. as placeholders for matching substrings. As with most languages, there's a dedicated page on Regular-Expressions.info for VBScript.
The following with do what you're looking for:
body = "Blah blah ASA3422df ASA2389ds ASA1265sa"
body = RegX.Replace(body, "<a href='http://www.code.com/$1'>$1</a>")
Debug.Print body
'-> Blah blah <a href='http://www.code.com/ASA3422df'>ASA3422df</a> <a href='http://www.code.com/ASA2389ds'>ASA2389ds</a> <a href='http://www.code.com/ASA1265sa'>ASA1265sa</a>
This replaces the matches (and only the matches) with a link, and leaves everything else untouched.

Over at codedawn, there is a fantastic add-in for Excel that gives you the same UI search and replace that you know and love, but for regular expressions.
http://www.codedawn.com/excel-add-ins.php
While this doesn't exactly help answer your question, it's useful for trying out regular expressions one after another without altering data or code.

Related

RegEx - VBA Finding splitting cell with two Uppercase [duplicate]

I'm new to VBA and would like to seek some help with regards to using RegEx and I hope somehow can enlighten me on what I'm doing wrong. I'm currently trying to split a date into its individual date, month and year, and possible delimiters include "," , "-" and "/".
Function formattedDate(inputDate As String) As String
Dim dateString As String
Dim dateStringArray() As String
Dim day As Integer
Dim month As String
Dim year As Integer
Dim assembledDate As String
Dim monthNum As Integer
Dim tempArray() As String
Dim pattern As String()
Dim RegEx As Object
dateString = inputDate
Set RegEx = CreateObject("VBScript.RegExp")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
' .... code continues
This is what I am currently doing. However, there seems to be something wrong during the RegEx.Split function, as it seems to cause my codes to hang and not process further.
To just confirm, I did something simple:
MsgBox("Hi")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
MsgBox("Bye")
"Hi" msgbox pops out, but the "Bye" msgbox never gets popped out, and the codes further down don't seem to get excuted at all, which led to my suspicion that the RegEx.Split is causing it to be stuck.
Can I check if I'm actually using RegEx.Split the right way? According to MSDN here, Split(String, String) returns an array of strings as well.
Thank you!
Edit: I'm trying not to explore the CDate() function as I am trying not to depend on the locale settings of the user's computer.
To split a string with a regular expression in VBA:
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(text, ChrW(-1)), ChrW(-1))
End Function
Usage example:
Dim v
v = SplitRe("a,b/c;d", "[,;/]")
Splitting by a regex is definitely nontrivial to implement compared to other regex operations, so I don't blame you for being stumped!
If you wanted to implement it yourself, it helps to know that RegExp objects from Microsoft VBScript Regular Expressions 5.5 have a FirstIndex property and a Length property, such that you can loop through the matches and pick out all the substrings between the end of one match (or the start of the string) and the start of the next match (or the end of the string).
If you don't want to implement it yourself, I've also implemented a RegexSplit UDF using those same RegExp objects on my GitHub.
Quoting an example from the documentation of VbScript Regexp:
https://msdn.microsoft.com/en-us/library/y27d2s18%28v=vs.84%29.aspx
Function SubMatchTest(inpStr)
Dim retStr
Dim oRe, oMatch, oMatches
Set oRe = New RegExp
' Look for an e-mail address (not a perfect RegExp)
oRe.Pattern = "(\w+)#(\w+)\.(\w+)"
' Get the Matches collection
Set oMatches = oRe.Execute(inpStr)
' Get the first item in the Matches collection
Set oMatch = oMatches(0)
' Create the results string.
' The Match object is the entire match - dragon#xyzzy.com
retStr = "Email address is: " & oMatch & vbNewLine
' Get the sub-matched parts of the address.
retStr = retStr & "Email alias is: " & oMatch.SubMatches(0) ' dragon
retStr = retStr & vbNewLine
retStr = retStr & "Organization is: " & oMatch.SubMatches(1) ' xyzzy
SubMatchTest = retStr
End Function
To test, call:
MsgBox(SubMatchTest("Please send mail to dragon#xyzzy.com. Thanks!"))
In short, you need your Pattern to match the various parts you want to extract, with the spearators in between, maybe something like:
"(\d+)[/-,](\d+)[/-,](\d+)"
The whole thing will be in oMatch, while the numbers (\d) will end up in oMatch.SubMatches(0) to oMatch.SubMatches(2).

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

vb.net Regex - Replace a tags without replacing span tags

My function needs to replace a tags from a string if the data extracted in it has a url.
for example:
<a href=www.cnn.com>www.cnn.com</a>
will be replace with:
www.cnn.com
That works fine but when i have a string like:
www.cnn.com</span>
I get only:
www.cnn.com
when i actually want to stay with:
<span style="color: rgb(255, 0, 0);">www.cnn.com</span>
What do i need to add to the code for it to work?
This is my function:
Dim ret As String = text
'If it looks like a URL
Dim regURL As New Regex("(www|\.org\b|\.com\b|http)")
'Gets a Tags regex
Dim rxgATags = New Regex("<[^>]*>", RegexOptions.IgnoreCase)
'Gets all matches of <a></a> and adds them to a list
Dim matches As MatchCollection = Regex.Matches(ret, "<a\b[^>]*>(.*?)</a>")
'for each <a></a> in the text check it's content, if it looks like URL then delete the <a></a>
For Each m In matches
'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
Dim tmpText = rxgATags.Replace(m.ToString, "")
If regURL.IsMatch(tmpText) Then
ret = ret.Replace(m.ToString, tmpText)
End If
Next
Return ret
The following Regex will remove all HTML tags:
string someString = "www.visitus.com</span>";
string target = System.Text.RegularExpressions.Regex.Replace(someString, #"<[^>]*>", "", RegexOptions.Compiled).ToString();
This is the Regex you want : <[^>]*>
Result of my code : www.visitus.com
You may use the following regex - <a\s*[^<>]*>|</a> - that will match all <a> nodes, both opening and close ones.
You do not need to use regURL, this can be built into the rxATags regex. We can make sure it is an URL-referencing <a> tag by checking href and regURL alternatives, then grab everything in between the opening and close` tags, and then use only what is in between.
Dim ret As String = "www.visitus.com</span>"
'Gets a Tags regex
Dim rxgATags = New Regex("(<a\s*[^<>]*href=[""']?(?:www|\.org\b|\.com\b|http)[^<>]*>)((?>\s*<(?<t>[\w.-]+)[^<>]*?>[^<>]*?</\k<t>>\s*)+)(</a>)", RegexOptions.IgnoreCase)
Dim replacement As String = "$2"
ret = rxgATags.Replace(ret, replacement)
I add this to my code:
'Selects only the A tags without the data extracted between them
Dim rxgATagsOnly = New Regex("</?a\b[^>]*>", RegexOptions.IgnoreCase)
For Each m In matches
'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
Dim tmpText = rxgATagsContent.Replace(m.ToString, "")
'if the data extract between the tags looks like a URL then take off the a tags without touching the span tags.
If regURL.IsMatch(tmpText) Then
'select everything but a tags
Dim noATagsStr As String = rxgATagsOnly.Replace(m.ToString, Environment.NewLine)
'replaces string with a tag to non a tag string keeping it's span tags
ret = ret.Replace(m.ToString, noATagsStr)
End If
Next
so from the string:
www.cnn.com</span>
i selected only the a tags with Avinash Raj regex
and then replaced them with "".
Thank you all for answering.

Replace matched pattern with different font

I am using Outlook 2010, and I am trying to write a macro to replace the font of text with a different one, if it matches a pattern.
The logic I am trying to apply is simple - in the user selected text, check for a pattern, and on match, change the font for the matched text.
So far I have been able to split the text and apply/check regex, but the replacement is something that I am not clear on how to do.
Dim objOL As Application
Dim objDoc As Object
Dim objSel As Object
Dim regEx As RegExp
Dim matches As MatchCollection
Dim m As Match
Dim lines As Variant
Dim ms As String
Set objOL = Application
Set objDoc = objOL.ActiveInspector.WordEditor
Set objSel = objDoc.Windows(1).Selection
lines = Split(objSel, Chr(13))
For i = 0 To UBound(lines) Step 1
Set regEx = New RegExp
With regEx
.Pattern = "\[(ok|edit|error)\](\[.*\])?" ' <-- this is just one regex, I want to be able to check more regexes
.Global = True
End With
If regEx.Test(lines(i)) Then
Set matches = regEx.Execute(lines(i))
For Each m In matches
ms = m.SubMatches(1)
' ms.Font.Italic = True
' <-- here is where I am not sure how to replace! :( -->
Next
End If
Next i
P.S there seems to be text-search (objSel.Find.Text)and replace (objSel.Find.Replacement.Text) methods in Selection object, but not pattern-search ! (or I am missing it)
--EDIT--
Adding a sample text
user#host> show some data
..<few lines of data>.. <-- these lines as-is (but monospaced)
[ok][2014-11-26 11:05:02]
user#host> edit some other data
[edit data]
user#host(data)% some other command
I want to convert the whole block to a monospaced font (like Courier New, or Consolas)
And change the part that begins with something#somewhere.. and till > or % to dimmer color,
(i.e in this example user#host> and user#host(data)% dimmer/grey)
The rest in that line to bold (show some data et al)
And, all the bracketed text followed by time-stamps (or without timestamps) similar to 2. (i.e, dim/grey)
This is getting closer to being done. The framework is here to make all sorts of changes now. Just need to get some of the regex patterns down to make the changes.
Sub FormatSelection()
Dim objMailItem As Outlook.MailItem
Dim objInspector As Outlook.Inspector: Set objInspector = Application.ActiveInspector
Dim objHtmlEditor As Object
Dim objWord As Object
Dim Range As Word.Selection
Dim objSavedSelection As Word.Selection
Dim objFoundText As Object
' Verify a mail object is in focus.
If objInspector.CurrentItem.Class = olMail Then
' Get the mail object.
Set objMailItem = objInspector.CurrentItem
If objInspector.EditorType = olEditorWord Then
' We are using a Word editor. Get the selected text.
Set objHtmlEditor = objMailItem.GetInspector.WordEditor
Set objWord = objHtmlEditor.Application
Set Range = objWord.Selection
Debug.Print Range.Range
' Set defaults for the selection
With Range.Font
.Name = "Courier"
.ColorIndex = wdAuto
End With
' Stylize the bracketed text
Call FormatTextWithRegex(Range, 2, "\[(.+?)\]")
' Prompt style text.
Call FormatTextWithRegex(Range, 2, "(\w+?#.+?)(?=[\>\%])")
' Text following the prompt.
Call FormatTextWithRegex(Range, 3, "(\w+?#.+?[\>\%])(.+)")
End If
End If
Set objInspector = Nothing
Set Range = Nothing
Set objHtmlEditor = Nothing
Set objMailItem = Nothing
End Sub
Private Sub FormatTextWithRegex(ByRef pRange As Word.Selection, pActionIndex As Integer, pPattern As String)
' This routine will perform a regex replacement on the text in pRange using pPattern
' on text based on the pactionindex passed.
Const intLightColourIndex = 15
Dim objRegex As RegExp: Set objRegex = New RegExp
Dim objSingleMatch As Object
Dim objMatches As Object
' Configure Regex object.
With objRegex
.IgnoreCase = True
.MultiLine = False
.Pattern = pPattern ' Example "\[(ok|edit|error)\](\[.+?\])?"
.Global = True
End With
' Locate all matches if any.
Set objMatches = objRegex.Execute(pRange.Text)
' Find
If (objMatches.Count > 0) Then
Debug.Print objMatches.Count & " Match(es) Found"
For Each objSingleMatch In objMatches
' Locate the text associated to this match in the selection so we can replace it.
Debug.Print "Match Found: '" & objSingleMatch & "'"
With pRange.Find
'.ClearFormatting
.Text = objSingleMatch.Value
.ClearFormatting
Select Case pActionIndex
Case 1 ' Italisize text
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = True
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
Case 2 ' Dim the colour
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = intLightColourIndex
.Execute Replace:=wdReplaceAll
Case 3 ' Bold that text!
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = True
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
End Select
End With
Next
Else
Debug.Print "No matches found for pattern: " & pPattern
End If
Set objRegex = Nothing
Set objSingleMatch = Nothing
Set objMatches = Nothing
End Sub
So we take what the user has selected and execute the macro. I have my Outlook configured with Word for the editor so that is tested for. Take the selected text and run the regex query against the text saving the matches.
The issue you had is what to do with the match once you found it. In my case since we have the actual text that matched we can run that through a find and replace using the selection once again. Replacing the text with itself instead styled as directed.
Caveats
My testing text was the following:
asdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdfasdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdf
I had to change your regex in your sample to be less greedy since it was matching both [ok][Test] sections. I don't know what kind of text you are working with so my logic might not apply to your situation. Test with caution.
You also had a comment that you needed to test multiple regexes... regexies.... I don't know what the plural is. Wouldn't be hard to create another function that calls this one for several patterns. Assuming this logic works repeating it should not be a big deal. I would like to make this work for you so if something is wrong let me know.
Code Update
I have changed the code so that the regex replacement is in a sub. So what the code does right now is change the selected text to courier and italisize text based on a regex. Now with how it is set up you can use the sub routine FormatTextWithRegex to make changes. Just need to update the pattern and action index which will perform the different styles. Will be updating this again soon with more information. Right now all that exists is the structure that I think you need.
Having issues with the bolding still but you can see the grey part is working correctly. Also the since this relies on highlighting the multiple calls to the function are causing an issue. Just not sure what it is.

How to change case of matching letter with a VBA regex Replace?

I have a column of lists of codes like the following.
2.A.B, 1.C.D, A.21.C.D, 1.C.D.11.C.D
6.A.A.5.F.A, 2.B.C.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.B.C.x, 1.N.N.9.J.K
I want to find all instances of two single upper-case letters separated by a period, but only those that follow a number less than 6. I want to remove the period between the letters and convert the second letter to lower case. Desired output:
2.Ab, 1.Cd, A.21.C.D, 1.Cd.11.C.D
6.A.A.5.Fa, 2.Bc.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.Bc.x, 1.Nn.9.J.K
I have the following code in VBA.
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
c.Value = re.Replace("$1$2")
Next c
End Sub
This removes the period, but doesn't handle the lower-case requirement. I know in other flavors of regular expressions, I can use something like
re.Replace("$1\L$2\E")
but this does not have the desired effect in VBA. I tried googling for this functionality, but I wasn't able to find anything. Is there a way to do this with a simple re.Replace() statement in VBA?
If not, how would I go about achieving this otherwise? The pattern matching is complex enough that I don't even want to think about doing this without regular expressions.
[I have a solution I worked up, posted below, but I'm hoping someone can come up with something simpler.]
Here is a workaround that uses the properties of each individual regex match to make the VBA Replace() function replace only the text from the match and nothing else.
Sub fixBlah2()
Dim re As VBScript_RegExp_55.RegExp, Matches As VBScript_RegExp_55.MatchCollection
Dim M As VBScript_RegExp_55.Match
Dim tmpChr As String, pre As String, i As Integer
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
'Count of number of replacements made. This is used to adjust M.FirstIndex
' so that it still matches correct substring even after substitutions.
i = 0
Set Matches = re.Execute(c.Value)
For Each M In Matches
tmpChr = LCase(M.SubMatches.Item(1))
If M.FirstIndex > 0 Then
pre = Left(c.Value, M.FirstIndex - i)
Else
pre = ""
End If
c.Value = pre & Replace(c.Value, M.Value, M.SubMatches.Item(0) & tmpChr, _
M.FirstIndex + 1 - i, 1)
i = i + 1
Next M
Next c
End Sub
For reasons I don't quite understand, if you specify a start index in Replace(), the output starts at that index as well, so the pre variable is used to capture the first part of the string that gets clipped off by the Replace function.
So this question is old, but I do have another workaround. I use a double regex so to speak, where the first engine looks for the match as an execute, then I loop through each of those items and replace with a lowercase version. For example:
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
dim ToReplace as Object
Set re = New VBScript_RegExp_55.RegExp
for each c in Selection.Cells
with re `enter code here`
.Global = True
.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
Set ToReplace = .execute(C.Value)
end with
'This generates a list of items that match. Now to lowercase them and replace
Dim LcaseVersion as string
Dim ItemCt as integer
for itemct = 0 to ToReplace.count - 1
LcaseVersion = lcase(ToReplace.item(itemct))
with re `enter code here`
.Global = True
.Pattern = ToReplace.item(itemct) 'This looks for that specific item and replaces it with the lowercase version
c.value = .replace(C.Value, LCaseVersion)
end with
End Sub
I hope this helps!