VBScript RegEx Pattern Match using existing String - regex

Good morning,
I'm new to VBScript and not great with RegEx just yet. I'm working on a project in which I need to match a pre-existing string to the beginning of a line in a text file, then place that whole line into a string. In my test, I can assign my own string, but in the production environment, it will pull the string from an object int he application. For example, the string would be "0001", and the beginning of the line in the text file would be 0001, followed by the rest of the text that I also need to apply to the new string. Below is the code that I have so far. My issue is that I don't know how to apply the current string to the RegEx pattern, or what else I would need to include in it to perform exactly this search.
Dim strCode
strCode = "0001"
Dim objFSO, objFile, objRegEx
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Pattern = strCode & 'This is where I'm not sure exactly how to apply RegEx
Dim afterMid
Dim n
n = 4
Dim result
Dim newString
Dim LogFile
LogFile = "c:\Users\runto\Documents\Test Logfile.txt"
Set objFile = objFSO.OpenTextFile(LogFile,1)
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Set colMatches = objRegEx.Execute(strSearchString)
If colMatches.Count > 0 Then
For Each strCode in colMatches
newString = strSearchString
Next
End If
Loop
MsgBox newString
Any help would be massively appreciated.
Thanks!

Match line starting with strCode:
objRegEx.Pattern = "^" & strCode & ".*"
'^' = Anchor to the start of the string
strCode = followed by your pattern
'.' = followed by any character
'*' = followed by zero or more occurrences of the previous character
So the regex becomes "^0001.*"
Oh and you can use
objRegEx.Test(strSearchString)
To see if the string matches your pattern.
Update: Test script illustrates how to first escape non-alphanumeric characters, then performs the comparison:
Dim strCode
Dim strSearchStr
strCode = "0.0[1"
strSearchString = "0.0[1 - test string"
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Pattern = "([^A-Za-z0-9])" ' Match non-alphanum chars
objRegEx.global = True
strCode = objRegEx.replace(strCode, "\$1") ' Escape with a backslash
'msgbox strCode
objRegEx.Pattern = "^" & strCode & ".*" ' Compare like before
if objRegEx.Test(strSearchString) then
msgbox "match"
else
msgbox "No match"
end if

Related

How do I print my extracted pattern in a column using regex.execute and match object in vba?

I'm using vba to write a sub to extract pin codes from given addresses in a column in an excel worksheet. I was able to find the regex pattern to extract the pin pattern but Im unable to output the said extracted pins to a column. As a way to test whether the regex is able to extract the pin pattern from the column (it is) I passed the Match.value property from matches object to a msgbox and was able to get an output for each string in a msgbox.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "\d{6}"
Dim Match As Object
Dim matches As Object
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("B1:B30")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regex.Test(strInput) Then
Set matches = regex.Execute(strInput)
For Each Match In matches
MsgBox (Match.Value) 'A workaround I found to see if my pattern
'worked but I need to print Match.value
'in a column so this wont do
Next
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
How do I extract the pattern string from the match object and print it into a column (like U1:U30) for each cell in my range B1:B30
TL;DR: Regex Pattern working but how to print extracted pattern in cell
How about collecting the matches comma separated in a string strMatches and write that to a cell?
Add this before For Each cell In Myrange
Dim i As Long, strMatches As String
i = 1 'row number where we start to write
And replace your other For Each with
strMatches = vbNullString
For Each Match In matches
strMatches = strMatches & Match.Value & ", " 'collect all matches comma seprated
Next
If Not strMatches = vbNullString Then strMatches = Left(strMatches, Len(strMatches) - 2) 'remove last comma
Worksheets("your-sheet-name").Range("U" & i).Value = strMatches 'write the matches into cell
i = i + 1

Add a space after comma using VBA regex

I'm trying to use a regex to find cells in a range that have a comma, but no space after that comma. Then, I want to simply add a space between the comma and the next character. For example, a cell has Wayne,Bruce text inside, but I want to turn it to Wayne, Bruce.
I have a regex pattern that can find cells with characters and commas without spaces, but when I replace this, it cuts off some characters.
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String: strPattern = "[a-zA-Z]\,[a-zA-Z]"
Dim strReplace As String: strReplace = ", "
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("P1:P5")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.TEST(strInput) Then
Debug.Print (regEx.Replace(strInput, strReplace))
Else
Debug.Print ("No Regex Not matched in " & cell.address)
End If
End If
Next
Set regEx = Nothing
End Sub
If I run that against "Wayne,Bruce" I get "Wayn, ruce". How do I keep the letters, but separate them?
Change the code the following way:
Dim strPattern As String: strPattern = "([a-zA-Z]),(?=[a-zA-Z])"
Dim strReplace As String: strReplace = "$1, "
Output will be Bruce, Wayne.
The problem is that you cannot use a look-behind in VBScript, so we need a workaround in the form of a capturing group for the letter before the comma.
For the letter after the comma, we can use a look-ahead, it is available in this regex flavor.
So, we just capture with ([a-zA-Z]) and restore it in the replacing call with a back-reference $1. Look-ahead does not consume characters, so we are covered.
(EDIT) REGEX EXPLANATION
([a-zA-Z]) - A captured group that includes a character class matching just 1 English character
, - Matching a literal , (you actually do not have to escape it as it is not a special character)
(?=[a-zA-Z]) - A positive look-ahead that only checks (does not match, or consume) if the immediate character following the comma is and English letter.
If we replace all commas with comma+space and then replace comma+space+space with comma+space, we can meet your requirement:
Sub NoRegex()
Dim r As Range
Set r = Range("P1:P5")
r.Replace What:=",", Replacement:=", "
r.Replace What:=", ", Replacement:=", "
End Sub
Uses the same RegExp as in the solution from stribizhev but with two optimisations for speed
Your current code sets the RegExp details for every cell tested, these only need setting once.
Looping through a varinat array is much faster than a cell range
code
Private Sub simpleRegexSearch()
' adapted from http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
Dim strPattern As String:
Dim strReplace As String:
Dim regEx As Object
Dim strInput As String
Dim X, X1
Dim lngnct
Set regEx = CreateObject("vbscript.regexp")
strPattern = "([a-zA-Z])\,(?=[a-zA-Z])"
strReplace = "$1, "
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
X = ActiveSheet.Range("P1:P5").Value2
For X1 = 1 To UBound(X)
If .TEST(X(X1, 1)) Then
Debug.Print .Replace(X(X1, 1), strReplace)
Else
Debug.Print ("No Regex Not matched in " & [p1].Offset(X1 - 1).Address(0, 0))
End If
Next
End With
Set regEx = Nothing
End Sub
What you are doing via Regex is to find a pattern
(any Alphabet),(any Alphabet)
and then replace such pattern to
,_
where _ implies a space.
So if you have Wayne,Bruce then the pattern matches where e,B. Therefore the result becomes Wayn, ruce.
Try
Dim strPattern As String: strPattern = "([a-zA-Z]),([a-zA-Z])"
Dim strReplace As String: strReplace = "$1, $2"
.

Using Regular expression in VBA

This is my sample record in a Text format with comma delimited
901,BLL,,,BQ,ARCTICA,,,,
i need to replace ,,, to ,,
The Regular expression that i tried
With regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern="^(?=[A-Z]{3})\\,{3,}",",,"))$ -- error
Now i want to pass Line from file to Regex to correct the record, can some body guide me to fix this i am very new to VBA
I want to read the file line by line pass it to Regex
Looking at your original pattern I tried using .Pattern = "^\d{3},\D{3},,," which works on the sample record as with the 3 number characters , 3 letters,,,
In the answer I have used a more generalised pattern .Pattern = "^\w*,\w*,\w*,," This also works on the sample and mathces 3 commas each preceded with 0 or more alphanumeric characters followed directly by a fourth comma. Both patterns require a match to be from the begining of the string.
Pattern .Pattern = "^\d+,[a-zA-Z]+,\w*,," also works on the sample record. It would specify that before the first comma there should be 1 or greater numeric characters (and only numeric characters) and before the second comma ther should be 1 or more letters (and only letters). Before the 3rd comma there could be 0 or more alphanumeric characters.
The left function removes the rightmost character in the match ie. the last comma to generate the string used by the Regex.Replace.
Sub Test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
Debug.Print
Debug.Print str
str = strConv(str)
Debug.Print str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object
Dim oMatches As Object
Dim oMatch As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = "^\w*,\w*,\w*,,"
End With
Set oMatches = objRegEx.Execute(str)
If oMatches.Count > 0 Then
For Each oMatch In oMatches
str = objRegEx.Replace(str, Left(oMatch.Value, oMatch.Length - 1))
Next oMatch
End If
strConv = str
End Function
Try this
Sub test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
str = strConv(str)
MsgBox str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object, allMatches As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = ",,,"
End With
strConv = objRegEx.Replace(str, ",,")
End Function

vbscript: replace text in activedocument with hyperlink

Starting out at a new job and I have to go through a whole lot of documents that my predecessor left. They are MS Word-files that contain information on several hundreds of patents. Instead of copy/pasting every single patent-number in an online form, I would like to replace all patent-numbers with a clickable hyperlink. I guess this should be done with vbscript (I'm not used to working with MS Office).
I have so far:
<obsolete>
This is not working for me:
1. I (probably) need to add something to loop through the ActiveDocument
2. The replace-function probably needs a string and not an object for a parameter - is there a __toString() in vbscript?
THX!
UPDATE:
I have this partially working (regex and finding matches) - now if only I could get the anchor for the hyperlink.add-method right...
Sub HyperlinkPatentNumbers()
'
' HyperlinkPatentNumbers Macro
'
Dim objRegExp, Matches, match, myRange
Set myRange = ActiveDocument.Content
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US)([0-9]*)(A1|A2|B1|B2)"
End With
Set Matches = objRegExp.Execute(myRange)
If Matches.Count >= 1 Then
For Each match In Matches
ActiveDocument.Hyperlinks.Add Anchor:=objRegExp.match, Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Next
End If
Set Matches = Nothing
Set objRegExp = Nothing
End Sub
Is this VBA or VBScript? In VBScript you cannot declare types like Dim newText As hyperLink, but every variable is a variant, so: Dim newText and nothing more.
objRegEx.Replace returns the string with replacements and needs two parameters passed into it: The original string and the text you want to replace the pattern with:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = False
objRegEx.Pattern = "^(WO|EP|US)([0-9]*)(A1|A2|B1|B2)$"
' assuming plainText contains the text you want to create the hyperlink for
strName = objRegEx.Replace(plainText, "$1$2$3")
strAddress = objRegex.Replace(plainText, "http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Now you can use strName and strAddress to create the hyperlink with.
Pro-tip: You can use objRegEx.Test(plainText) to see if the regexp matches anything for early handling of errors.
Problem solved:
Sub addHyperlinkToNumbers()
Dim objRegExp As Object
Dim matchRange As Range
Dim Matches
Dim match
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US|FR|DE|GB|NL)([0-9]+)(A1|A2|A3|A4|B1|B2|B3|B4)"
End With
Set Matches = objRegExp.Execute(ActiveDocument.Content)
For Each match In Matches
'This doesn't work, because of the WYSIWYG-model of MS Word:
'Set matchRange = ActiveDocument.Range(match.FirstIndex, match.FirstIndex + Len(match.Value))
Set matchRange = ActiveDocument.Content
With matchRange.Find
.Text = match.Value
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
.Execute
End With
ActiveDocument.Hyperlinks.Add Anchor:=matchRange, _
Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=" _
& match.Submatches(0) & "&NR=" & match.Submatches(1) & "&KC=" & match.Submatches(2)
Next
MsgBox "Hyperlink added to " & Matches.Count & " patent numbers"
Set objRegExp = Nothing
Set matchRange = Nothing
Set Matches = Nothing
Set match = Nothing
End Sub

vba regex: dot matching newline

I want to match (in this snippet) everything upto but not including a newline which is what I thought a . would do. Could someone shed light on what I'm missing please.
Public Sub regexp()
Dim oRegExp As regexp
Dim oMatches As MatchCollection
Dim oMatch As Match
Dim sString As String
sString = _
"one" & vbNewLine & _
"two" & vbNewLine
Set oRegExp = New regexp
With oRegExp
.Global = True
.Pattern = ".+"
Set oMatches = .Execute(sString)
For Each oMatch In oMatches
Debug.Print "*" & oMatch.Value & "*"
Next oMatch
End With
End Sub
Output is
*one
*
*two
*
Expected output
*one*
*two*
How can I avoid the newline in the output? Thanks.
If you use [^\n] in place of ., it will match any character except the new line character.