I am trying to write a VBA macro for MS Word 2010 that capitalizes letters after a special character. In my case an underscore "_". The words that I want to revise, start with a special prefix. I am having trouble with the replace operation. I am using Microsoft Regular Expression Library 5.5.
This is what I have so far:
Sub ReplaceFunc()
'
' ReplaceFunc Macro
'
'
Debug.Print ("entered replaceFunc")
Dim myRegex As New RegExp
myRegex.Global = True
myRegex.IgnoreCase = False
myRegex.MultiLine = True
' i want to find all words in the document which start with BlaBlub and have a suffix like _foo_bar or _foo_bar_foo
' e.g. BlaBlub_foo_bar, BlaBlub_foo_foo_bar_bar, BlaBlub_foo_bar_foo
myRegex.Pattern = "\bBlaBlub(_([a-z])+)+\b"
' works i get the results i was looking for
Set Matches = myRegex.Execute(ActiveDocument.Range.Text)
' now i want to capitalize every letter after a "_", e.g. BlaBlub_foo_bar --> BlaBlub_Foo_Bar
For Each Match In Matches
' The idea is to run a new RegEx on every found substring but this time with replace
Dim mySubRegex As New RegExp
mySubRegex.Global = True
mySubRegex.IgnoreCase = False
mySubRegex.MultiLine = True
' Matching every underscore followed by a non capital letter
mySubRegex.Pattern = "_([a-z])"
' getting start and endindex from the match to run the regex only on the found word
startIndex = Match.FirstIndex
endIndex = (Match.FirstIndex + Match.Length)
' where it fails with a syntax error
mySubRegex.Replace(ActiveDocument.Range(Start:=startIndex, End:=endIndex).Text , "_\u$1")
Next
Debug.Print ("leaving replaceFunc")
End Sub
The VBA macro fails with a syntax error in the line:
mySubRegex.Replace(ActiveDocument.Range(Start:=startIndex, End:=endIndex).Text , "_\u$1")
I am out of ideas, what to do to get it working. Can you point out what my error is and how to fix it?
This is very easy to correct, just suppress parentheses:
mySubRegex.Replace(ActiveDocument.Range(Start:=startIndex, End:=endIndex).Text , "_\u$1")
=>
mySubRegex.Replace ActiveDocument.Range(Start:=startIndex, End:=endIndex).Text , "_\u$1"
Or
Dim varVal
varVal = mySubRegex.Replace(ActiveDocument.Range(Start:=startIndex, End:=endIndex).Text , "_\u$1")
Related
I'm a big fan of stackoverflow, though I'm new to using regular expressions. I have a QND search utility that I wrote to help me find/report things I'm searching for in source code. I'm having a problem with figuring out what's wrong with my pattern searching that's not returning a match string that includes all text between two double quotes. In one search it works (looking for Session variables), but in a similar one (looking for redirects) it doesn't.
Here's a sample aspx.vb file that I'm testing against:
Partial Class _1
Inherits System.Web.UI.Page
Private strSecurityTest As String = ""
Private strUserId As String = ""
Private strPassword As String = ""
Private strMyName As String = ""
Private Sub sample()
strSecurityTest = Session("UserID")
If strSecurityTest = "NeedsLogin" Or
strSecurityTest = "" Or
Session("SecureCount") = 0 Or
Session("CommandName") <> strMyName Then
Server.Transfer("WebApLogin.aspx")
End If
End Sub
End Class
Sucessful match:
When I look for all occurances of Session("*") with pattern ==> Session\(\"\w*\"\)
I get correct results. Noting the above source code, I get 3 matches returned:
Session("UserID")
Session("SecureCount")
Session("CommandName")
Failed matching:
However when I try another search by replacing "Session" with "Transfer" ==> Transfer\(\"\w*\"\)
nothing is returned.
I have also tried these matching patterns:
Server.Transfer("*") ==> Server\.Transfer\(\"\w*\"\)
*Server.Transfer("*") ==> \w*Server\.Transfer\(\"\w*\"\)
Each of these doesn't return any matches.
In my live code I tried removing vbCr, vbLf, vbCrLf before the regex match, but still no matches
were found.
Symptom:
A symptom that I see is when I remove the text from the right side of the pattern, up to and
including the \w* ... then the search finds matches ==> Transfer\(\" However since the search
is now open-ended ... I can't capture the value between the double quotes that I want.
Sample VB code is:
Private Sub TestRegExPattern(wData As String, wPattern As String, bMatchCase As Boolean)
'
' Invoke the Match method.
'
Dim m As Match = Nothing
If Not bMatchCase Then
m = Regex.Match(wData, wPattern, RegexOptions.IgnoreCase)
Else
m = Regex.Match(wData, wPattern)
End If
'
' If first match found process and look for more
'
If (m.Success) Then
'
' Process match
'
' Get next match.
While m.Success
m = m.NextMatch()
If m.Success Then
'
' Process additional matches
'
End If
End While
End If
m = nothing
End Sub
I'm looking for some pointers to understand why my simple search only works with one particular pattern, and not another that only changes the leading text to be matched explicitly.
I have several strings, e.g.
(3)_(9)--(11).(FT-2)
(10)--(20).(10)/test--(99)
I am trying Regex.Match(here I do no know) to get a list like this:
First sample:
3
_
9
--
11
.
FT-1
Second Sample:
10
--
20
.
10
/test--
99
So there are several numbers in brackets and any text between them.
Can anyone help me doing this in vb.net? A given string returns this list?
One option is to use the Split method of [String]
"(3)_(9)--(11).(FT-2)".Split('()')
Another option is to match everything excluding ( and )
As regex, this would do [^()]+
Breakdown
"[^()]" ' Match any single character NOT present in the list “()”
"+" ' Between one and unlimited times, as many times as possible, giving back as needed (greedy)
You can use following block of code to extract all matches
Try
Dim RegexObj As New Regex("[^()]+", RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(SubjectString)
While MatchResults.Success
' matched text: MatchResults.Value
' match start: MatchResults.Index
' match length: MatchResults.Length
MatchResults = MatchResults.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
This should work:
Dim input As String = "(3)_(9)--(11).(FT-2)"
Dim searchPattern As String = "\((?<keep>[^)]+)\)|(?<=\))(?<keep>[^()]+)"
Dim replacementPattern As String = "${keep}" + Environment.NewLine
Dim output As String = RegEx.Replace(input, searchPattern, replacementPattern)
The simplest way is to use Regex.Split (formulated as a little console test):
Dim input = {"(3)_(9)--(11).(FT-2)", "(10)--(20).(10)/test--(99)"}
For Each s As String In input
Dim parts = Regex.Split(s, "\(|\)")
Console.WriteLine($"Input = {s}")
For Each p As String In parts
Console.WriteLine(p)
Next
Next
Console.ReadKey()
So basically we have a one-liner for the regex part.
The regular expression \(|\) means: split at ( or ) where the braces are escaped with \ because of their special meaning within regex.
The slightly shorter regex [()] where the desired characters are enclosed in [] would produce the same result.
How can I write a condition which will compare Recipient.AdressEntry for example with the following String "I351" using RegEx?
Here is my If condition which works but is hardcoded to every known email address.
For Each recip In recips
If recip.AddressEntry = "Dov John, I351" Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
The reason I need this condition is email may have one of several colleagues from my team and one or more from another team. AdressEntry of my colleagues ends with I351 so I will check if this email contains one of my teammates.
For Each recip In recips
If (recip.AddressEntry = "Dov John, I351" _
Or recip.AddressEntry = "Vod Nohj, I351") Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
You still didn't clarify exactly what the condition you want to use for matching is, so I'll do my best:
If you simply want to check if the string ends with "I351", you don't need regex, you can use something like the following:
If recip.AddressEntry Like "*I351" Then
' ...
End If
If you want to check if the string follows this format "LastName FirstName, I351", you can achieve that using Regex by using something like the following:
Dim regEx As New RegExp
regEx.Pattern = "^\w+\s\w+,\sI351$"
If regEx.Test(recip.AddressEntry) Then
' ...
End If
Explanation of the regex pattern:
' ^ Asserts position at the start of the string.
' \w Matches any word character.
' + Matches between one and unlimited times.
' \s Matches a whitespace character.
' \w+ Same as above.
' , Matches the character `,` literally.
' \s Matches a whitespace character.
' I351 Matches the string `I351` literally.
' $ Asserts position at the end of the string.
Try it online.
Hope that helps.
What I'm trying to accomplish
I'm trying to create a function to use string interpolation within VBA. The issue I'm having is that I'm not sure how to replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
What I have found and tried
VBScript does not have a negative look behind as far as I can research.
Below has two examples of Patterns that I have already tried:
Private Sub testingInjectFunction()
Dim dict As New Scripting.Dictionary
dict("test") = "Line"
Debug.Print Inject("${test}1\n${test}2 & link: C:\\notes.txt", dict)
End Sub
Public Function Inject(ByVal source As String, dict As Scripting.Dictionary) As String
Inject = source
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
' PATTERN # 1 REPLACES ALL '\n'
'regEx.Pattern = "\\n"
' PATTERN # 2 REPLACES EXTRA CHARACTER AS LONG AS IT IS NOT '\'
regEx.Pattern = "[^\\]\\n"
' REGEX REPLACE
Inject = regEx.Replace(Inject, vbNewLine)
' REPLACE ALL '${dICT.KEYS(index)}' WITH 'dICT.ITEMS(index)' VALUES
Dim index As Integer
For index = 0 To dict.Count - 1
Inject = Replace(Inject, "${" & dict.Keys(index) & "}", dict.Items(index))
Next index
End Function
Desired result
Line1
Line2 & link: C:\notes.txt
Result for Pattern # 1: (Replaces when not wanted)
Line1
Line2 & link: C:\
otes.txt
Result for Pattern # 2: (Replaces the 1 in 'Line1')
Line
Line2 & link: C:\\notes.txt
Summary question
I can easily write code that doesn't use Regular Expressions that can achieve my desired goal but want to see if there is a way with Regular Expressions in VBA.
How can I use Regular Expressions in VBA to Replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
Yes, you may use a regex here. Since the backslash is not used to escape itself in these strings, you may modify your solution like this:
regEx.Pattern = "(^|[^\\])\\n"
S = regEx.Replace(S, "$1" & vbNewLine)
It will match and capture any char but \ before \n and then will put it back with the $1 placeholder. As there is a chance that \n appears at the start of the string, ^ - the start of string anchor - is added as an alternative into the capturing group.
Pattern details
(^|[^\\]) - Capturing group 1: start of string (^) or (|) any char but a backslash ([^\\])
\\ - a backslash
n - a n char.
I am creating a regular expression, in VBA that uses the JS flavor of RegEx. Here is the issue I have ran into:
Current RegEx:
(^6)(?:a|ab)?
I have a 6 followed by either nothing, an 'a' or 'ab'.
In the case of a 6 followed by nothing I want to return just the 6 using $1
In the case of a 6 followed by an 'a' or 'ab' I want to return 6B
So I need that 'B' to be optional, contingent on there being an 'a' or 'ab'.
Something to the effect of : $1B?
That of course does not work. I only want the B if the 'a' or 'ab' is present, otherwise just the $1.
Is this possible to do in a single regex pattern? I could just have 2 separate patterns, one looking for only a 6 and the other for 6'a'or'ab'... but my actual regex patterns are much more complicated and I might need several patterns to cover some of them...
Thanks for looking.
I don't think your question is clearly defined--for example, I don't know why you need a replace--but from what I can infer, something like the following may work for you:
target = "6ab"
result = ""
With New RegExp
.Pattern = "^(6)(?:a(b?))?"
Set matches = .Execute(target)
If Not matches Is Nothing Then
Set mat = matches(0)
result = mat.SubMatches(0)
If mat.SubMatches.Count > 1 Then
result = result & UCase(mat.SubMatches(1))
End If
End If
Debug.Print result
End With
You basically inspect the capture groups to determine whether or not there was a hit on the b capture. Whereas you used a|ab, I think and optional b (b?) is more to the point. It's probably stylistic more than anything.
As I mentioned in my comment, there is no way to tell a regex engine to choose between literal alternatives in the replacement string. Thus, all you can do is to access Submatches to check for values that you get there, and return appropriate values.
Note that a regex you have should have 2 capturing groups, or at least a capturing group where you do not know the exact text (the (ab?)).
Here is my idea in code:
Function RxCondReplace(ByVal str As String) As String
RxCondReplace = ""
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "^6(ab?)?"
Set objMatches = objRegExp.Execute(str)
Set objMatch = objMatches.Item(0) ' Only 1 match as .Global=False
If objMatch.SubMatches.Item(0) = "a" Or _ ' check if 1st group equals "a"
objMatch.SubMatches.Item(0) = "ab" Then ' check if 1st group equals "ab"
RxCondReplace = "6B"
ElseIf objMatch.SubMatches.Item(1) = "" Then ' check if 2nd group is empty
RxCondReplace = "6"
End If
End Function
' Calling the function above
Sub CallConditionalReplace()
Debug.Print RxCondReplace("6") ' => 6
Debug.Print RxCondReplace("6a") ' => 6B
Debug.Print RxCondReplace("6ab") ' => 6B
End Sub