vba regex: dot matching newline - regex

I want to match (in this snippet) everything upto but not including a newline which is what I thought a . would do. Could someone shed light on what I'm missing please.
Public Sub regexp()
Dim oRegExp As regexp
Dim oMatches As MatchCollection
Dim oMatch As Match
Dim sString As String
sString = _
"one" & vbNewLine & _
"two" & vbNewLine
Set oRegExp = New regexp
With oRegExp
.Global = True
.Pattern = ".+"
Set oMatches = .Execute(sString)
For Each oMatch In oMatches
Debug.Print "*" & oMatch.Value & "*"
Next oMatch
End With
End Sub
Output is
*one
*
*two
*
Expected output
*one*
*two*
How can I avoid the newline in the output? Thanks.

If you use [^\n] in place of ., it will match any character except the new line character.

Related

VBScript RegEx Pattern Match using existing String

Good morning,
I'm new to VBScript and not great with RegEx just yet. I'm working on a project in which I need to match a pre-existing string to the beginning of a line in a text file, then place that whole line into a string. In my test, I can assign my own string, but in the production environment, it will pull the string from an object int he application. For example, the string would be "0001", and the beginning of the line in the text file would be 0001, followed by the rest of the text that I also need to apply to the new string. Below is the code that I have so far. My issue is that I don't know how to apply the current string to the RegEx pattern, or what else I would need to include in it to perform exactly this search.
Dim strCode
strCode = "0001"
Dim objFSO, objFile, objRegEx
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Pattern = strCode & 'This is where I'm not sure exactly how to apply RegEx
Dim afterMid
Dim n
n = 4
Dim result
Dim newString
Dim LogFile
LogFile = "c:\Users\runto\Documents\Test Logfile.txt"
Set objFile = objFSO.OpenTextFile(LogFile,1)
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Set colMatches = objRegEx.Execute(strSearchString)
If colMatches.Count > 0 Then
For Each strCode in colMatches
newString = strSearchString
Next
End If
Loop
MsgBox newString
Any help would be massively appreciated.
Thanks!
Match line starting with strCode:
objRegEx.Pattern = "^" & strCode & ".*"
'^' = Anchor to the start of the string
strCode = followed by your pattern
'.' = followed by any character
'*' = followed by zero or more occurrences of the previous character
So the regex becomes "^0001.*"
Oh and you can use
objRegEx.Test(strSearchString)
To see if the string matches your pattern.
Update: Test script illustrates how to first escape non-alphanumeric characters, then performs the comparison:
Dim strCode
Dim strSearchStr
strCode = "0.0[1"
strSearchString = "0.0[1 - test string"
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Pattern = "([^A-Za-z0-9])" ' Match non-alphanum chars
objRegEx.global = True
strCode = objRegEx.replace(strCode, "\$1") ' Escape with a backslash
'msgbox strCode
objRegEx.Pattern = "^" & strCode & ".*" ' Compare like before
if objRegEx.Test(strSearchString) then
msgbox "match"
else
msgbox "No match"
end if

Regex for matching substring, but not containing word (word boundary issue)

I have 100,000 files (mostly office-type files). I'm using Excel VBA to check all the filenames that contain the word "list", but trying to avoid false positives (such as "specialist").
The answer provided for "Regex for matching substring, but not containing word" is very nearly what's required ( \b(?!String)\w*ring\w*\b ) except that my filenames do not have neat word boundaries.
The current pattern \b(?!specialist)\w*list\w*\b correctly ignores some variants (3 Specialist, 6-specialist, Specialists etc). Is it possible to modify the pattern so that it correctly weeds out the following variants as well: 1Specialist, 2_specialist and Xspecialists? If so, could someone please point me in the right direction?
Many thanks for any assistance/advice,
M
Here's the recursive subroutine that I've been using (apologies for poor formatting):
Sub RecursiveFolderPATTERN(objFolder As Scripting.Folder, _IncludeSubfolders As Boolean)
'Declare the variables
Dim objFile As Object
Dim objSubFolder As Scripting.Folder
Dim NextRow As Long
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)"
objRegExp.IgnoreCase = True
'Find the next available row
NextRow = Cells(Rows.Count, "A").End(xlUp).Row + 1
'Loop through each file in the folder
For Each objFile In objFolder.Files
If objRegExp.test(objFile) Then
Cells(NextRow, "A").Value = objFile.Name
Cells(NextRow, "E").Value = objFile.Size
Cells(NextRow, "F").Value = objFile.Type
Cells(NextRow, "G").Value = objFile.DateCreated
Cells(NextRow, "H").Value = objFile.DateLastAccessed
Cells(NextRow, "I").Value = objFile.DateLastModified
Cells(NextRow, "J").Value = objFile.Path
NextRow = NextRow + 1
End If
Next objFile
'Loop through files in the subfolders
If IncludeSubfolders Then
For Each objSubFolder In objFolder.Subfolders
Call RecursiveFolderPATTERN(objSubFolder, True)
Next objSubFolder
End If
End Sub
Answer edit: Changing the line If objRegExp.test(objFile) Then into If objRegExp.test(objFile.Name) Then fixed the issue.
Alternative answer edit: Changing the pattern from "([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)" to "(^(?!.*specialist).*list.*$)" also works well. Both approaches have their advantages, so I intend to use both of them.
If your goal is to find filenames that match to "list" but don't match "specialist", try the following regex:
(?i)^(?!.*specialist).*list.*$
EDIT
Delete the (?i) from the pattern and test it with the following snippet:
Sub RecursiveFolderPATTERN()
Dim objRegExp As Object, arrStrings() As String, _
i As Long, objMatch As Object
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "^(?!.*specialist).*list.*$"
End With
Dim TestString As String
TestString = "3 Specialist" & vbNewLine & _
"6-specialist" & vbNewLine & _
"Specialists" & vbNewLine & _
"true SpeciaList" & vbNewLine & _
"1 Specialist" & vbNewLine & _
"2_specialist" & vbNewLine & _
"Xspecialists" & vbNewLine & _
"TheListOfSpecialists.xlsx" & vbNewLine & _
"List" & vbNewLine & _
"lISTs" & vbNewLine & _
"Globalistics" & vbNewLine & _
"GlobalList.doc" & vbNewLine & _
"fatalistic" & vbNewLine & _
"The big list of PII.csv" & vbNewLine & _
"A few lISTs with something.xls"
arrStrings = Split(TestString, vbNewLine)
For i = LBound(arrStrings) To UBound(arrStrings)
If objRegExp.Test(arrStrings(i)) Then
Debug.Print arrStrings(i)
End If
Next
End Sub
Would something like this work for you?
([^A-Za-z]|^)list([^A-Za-z]|$)
It would match the word "list" that is not surrounded by other letters.
Or should some words containing "list" be acceptable?
Try it out
EDIT: To allow matching the word "lists" it can be changed to this:
([^A-Za-z]|^)lists?([^A-Za-z]|$)
EDIT 2: To whitelist some prefixes, you can change it to this (whitelists "address", "info" and "data" as prefixes for example purposes):
([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)

How do I print my extracted pattern in a column using regex.execute and match object in vba?

I'm using vba to write a sub to extract pin codes from given addresses in a column in an excel worksheet. I was able to find the regex pattern to extract the pin pattern but Im unable to output the said extracted pins to a column. As a way to test whether the regex is able to extract the pin pattern from the column (it is) I passed the Match.value property from matches object to a msgbox and was able to get an output for each string in a msgbox.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "\d{6}"
Dim Match As Object
Dim matches As Object
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("B1:B30")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regex.Test(strInput) Then
Set matches = regex.Execute(strInput)
For Each Match In matches
MsgBox (Match.Value) 'A workaround I found to see if my pattern
'worked but I need to print Match.value
'in a column so this wont do
Next
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
How do I extract the pattern string from the match object and print it into a column (like U1:U30) for each cell in my range B1:B30
TL;DR: Regex Pattern working but how to print extracted pattern in cell
How about collecting the matches comma separated in a string strMatches and write that to a cell?
Add this before For Each cell In Myrange
Dim i As Long, strMatches As String
i = 1 'row number where we start to write
And replace your other For Each with
strMatches = vbNullString
For Each Match In matches
strMatches = strMatches & Match.Value & ", " 'collect all matches comma seprated
Next
If Not strMatches = vbNullString Then strMatches = Left(strMatches, Len(strMatches) - 2) 'remove last comma
Worksheets("your-sheet-name").Range("U" & i).Value = strMatches 'write the matches into cell
i = i + 1

How to replace Numbers in Parentheses with some calculations in MS Word

I have a problem to replace some serial number such as [30] [31] [32]... to [31] [32] [33]... in MS word when I insert a new references in the middle of article. I have not found a solution way in GUI so I try to use VBA to do that replacement. I find a similar problem in stack overflow:
MS Word Macro to increment all numbers in word document
However, this way is a bit inconvenient because it have to generate some replacement array in other place. Can I make that replacement with regex and some function in MS Word VBA like code below?
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
End With
'How to do some calculations with $1?
Selection.Text = regExp.Replace(Selection.Text, "[$1]")
End Sub
But I don't know how to do some calculations with $1 in regExp? I have try use "[$1+1]" but it return [31+1] [32+1] [33+1]. Can anyone help? Thanks!
It is impossible to pass a callback function to the RegExp.Replace, so you have the only option: use RegExp.execute and process matches in a loop.
Here is an example code for your case (I took a shortcut since you only have the value to modify inside known delimiters, [ and ].)
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})]"
.Global = True
End With
'How to do some calculations with $1?
' Removing regExp.Replace(Selection.Text, "[$1]")
For Each m In regExp.Execute(Selection.Text)
Selection.Text = Left(Selection.Text, m.FirstIndex+1) _
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) _
& Mid(Selection.Text, m.FirstIndex + Len(m.Value))
Next m
End Sub
Here,
Selection.Text = Left(Selection.Text, m.FirstIndex+1) - Get what is before
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) - Add 10 to the captured number
& Mid(Selection.Text, m.FirstIndex + Len(m.Value)) - Append what is after the capture
That should do the trick :
Sub IncrementWithRegex()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim regMatch As Variant
Dim ACrO As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[\[]([0-9]{2})[\]]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
For Each regMatch In regEx.Execute(oRange.Text)
oRange.Text = _
Left(oRange.Text, _
InStr(1, oRange.Text, CStr(regMatch))) & _
CDbl(regMatch) + 1 & _
Right(oRange.Text, _
Len(CStr(regMatch)) + InStr(1, oRange.Text, CStr(regMatch)))
Next regMatch
Else
End If
Set Para = ParaNext
Loop
End Sub
To use this, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Here is a simple VBA macro you can use to achieve this :
Sub IncrementNumbers()
Dim regExp As Object
Dim i As Integer
Dim fullMatch As String
Dim subMatch As Integer
Dim replacement As String
Const TMP_PREFIX As String = "$$$"
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
.MultiLine = True
End With
'Ensure selected text match our regex
If regExp.test(Selection.Text) Then
'Find all matches
Set matches = regExp.Execute(Selection.Text)
' Start from last match
For i = 0 To (matches.Count - 1)
fullMatch = matches(i).Value
subMatch = CInt(matches(i).SubMatches(0))
'Increment by 1
subMatch = subMatch + 1
'Create replacement. Add a temporary prefix so we ensure [30] replaced with [31]
'will not be replaced with [32] when [31] will be replaced
replacement = "[" & TMP_PREFIX & subMatch & "]"
'Replace full match with [subMatch]
Selection.Text = Replace(Selection.Text, fullMatch, replacement)
Next
End If
'Now replacements are complete, we have to remove replacement prefix
Selection.Text = Replace(Selection.Text, TMP_PREFIX, "")
End Sub

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.