RegExp To Search For Array Of File Extensions - regex

I am using this syntax to search for one file extension, how can I alter it to search for 2?
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.pattern = ".mdb"
objRegExp.IgnoreCase = True

You can use a pipe character | to split two possible match values. For example, if you want to match against an Access database and an Excel spreadsheet, you would use this:
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.pattern = "\.mdb$|\.xls$"
objRegExp.IgnoreCase = True
The $ indicates that the pattern has to be found at the end of the string being checked. The . will also be treated as a special character so you need to escape it, using a \ backslash.

Related

vbscript regular expression, replace between two string

I have this xml:
<doc>
<ContactPrimaryEmail></ContactPrimaryEmail>
<ContactAlternateEmail></ContactAlternateEmail>
<ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile>
<ContactAlternateMobile></ContactAlternateMobile>
</doc>
I want to apply a regular expression in VBScript to replace the content "+00xxxxxx" of the attribute ContactPrimaryMobile, simply change the number:
<ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile>
I am new to vbscripting and my skills in creating the objects and applying the pattern are limited, so please can you help me converting this regex to use it in VBScript:
(?<=\<ContactPrimaryMobile\>)(.*)(?=\<\/ContactPrimaryMobile)
UPDATE
I get this:
Object doesn't support this property or method: 'Submatches'
when executing:
Dim oRE, oMatches
Set oRE = New RegExp
oRE.Pattern = "<ContactPrimaryMobile>(.*?)</ContactPrimaryMobile>"
oRE.Global = True
Set oMatches = oRE.Execute("<doc><ContactPrimaryEmail></ContactPrimaryEmail><ContactAlternateEmail></ContactAlternateEmail><ContactPrimaryMobile>+00xxxxxx</ContactPrimaryMobile><ContactAlternateMobile></ContactAlternateMobile></doc>")
Wscript.Echo oMatches.Submatches(0)
First of all, VBScript regex does not support lookbehinds, you need to capture the part in between the two strings.
Next, you need to obtain the submatch by accessing the match object after you .Execute the regex match, and get its .Submatches(0):
Dim oRE, oMatches, objMatch
oRE.Pattern = "<ContactPrimaryMobile>(.*?)</ContactPrimaryMobile>"
and then
Set oMatches = oRE.Execute(s)
For Each objMatch In oMatches
Wscript.Echo objMatch.Submatches(0)
Next
To replace, use the appropriate groupings and method:
oRE.Pattern = "(<ContactPrimaryMobile>).*?(</ContactPrimaryMobile>)"
' and then
s = oRE.Replace(s,"$1SOME_NEW_VALUE$2")
I know you explicitly said regex and you have your answer but an alternative approach to getting the same end goal is to use an XML parser instead.
option explicit
dim xmldoc
set xmldoc = CreateObject("MSXML2.DomDocument")
xmldoc.load "doc.xml"
dim primaryMobileNode
set primaryMobileNode = xmldoc.selectSingleNode("/doc/ContactPrimaryMobile")
primaryMobileNode.text = "new value"
xmldoc.save "changed-doc.xml"

Using Regex pattern to copy and rename Excel sheet using VBA

I having trouble creating VBA to copy an existing sheet and rename the copy with a specific suffix.
The existing sheet is named with a variable prefix (a digit code) followed by a fix suffix.
The copied sheet should be renamed with the same prefix, followed by another fix suffix.
I would like to use regex to do so, but I cannot figure out how to specify the sheet names with regex. The pattern would simply be something like [0-9]+ for the prefix.
The suffix are always the same.
Example:
Existing sheet: 123_raw
New copied sheet: 123_analyzed
This is what I have so far and don't know how to go on:
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[0-9]+"
It should look something similar to this I guess:
Sheets("regex pattern + [suffix]").Select
Sheets("regex pattern + [suffix]").Copy After:=Sheets(3)
Sheets("regex pattern + [suffix] (2)").Select
Sheets("regex pattern + [suffix] (2)").Name = "regex pattern + [new suffix]"
But I have no idea on how to actually code it.
Any help much appreciated!
Assuming your Sheet names are 123_raw 456_raw Or something [3 digits_words] so your Pattern will be Pattern = "([0-9]{3}\_)" https://regex101.com/r/Iu6nxn/1
([0-9]{3}\_) Match a single character present in the list below
0-9 a single character in the range between 0 and 9
{3} Quantifier — Matches exactly 3 times
\_ matches the character _ literally (case sensitive)
VBA Code Example as simple as it can be - Here we are searching for the sheet name 123_ or [3 digits_words] Copy and rename to 3 digits_analyzed
Option Explicit
Public Sub Example()
Dim RegExp As Object
Dim Matches As Variant
Dim Pattern As String
Dim NewName As String
Dim Sht As Worksheet
Set RegExp = CreateObject("VbScript.RegExp")
For Each Sht In ThisWorkbook.Worksheets
Pattern = "([0-9]{3}\_)" ' Sheet name 123_
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Sht.Name)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0) ' Print on Immediate Win
NewName = Matches(0) & "analyzed" ' New sheet name
Sht.Copy After:=Sheets(3) ' Copy Sheet
ActiveSheet.Name = NewName ' Rename sheet with new name
End If
Next
Set RegExp = Nothing
Set Matches = Nothing
Set Sht = Nothing
End Sub
Something like this (where _new replaces the prior suffix)
Sub B()
Dim ws As Worksheet
Dim objRegex As Object
Set ws = Sheets(1)
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "([0-9]+)_[a-z]+"
If .test(ws.Name) Then
ws.Copy After:=Sheets(Sheets.Count)
ActiveSheet.Name = .Replace(ws.Name, "$1_new")
End If
End With
End Sub

Vbscript Regular expression, find all between 2 known strings

I would like to select all texts between two know strings. Say for example the following text
*starthere
*General Settings
* some text1
* some text2
*endhere
I would like to select all texts between "*starthere" and "*endhere" using vbscript. so that the final output looks like the following
*General Settings
* some text1
* some text2
I know this would be simpler using a regex since there are multiple instances of such pattern in the file i read.
I tried something like the following
/(.*starthere\s+)(.*)(\s+*endhere.*)/
/(*starthere)(?:[^])*?(*endhere)/
But they dont seem to work and it selects even the start and end strings together. Lookforward and backword dont seem to work either and iam not sure if they have support for vbscript.
This is the code I am using:
'Create a regular expression object
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "/\*starthere\s+([\s\S]*?)\s+\*endhere\b/" objRegExp.IgnoreCase = True
objRegExp.Global = True
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Dim objMatches
Set objMatches = objRegExp.Execute(strSearchString)
If objMatches.Count > 0 Then
out = out & objMatches(0) &vbCrLf
WScript.Echo "found"
End If
Loop
WScript.Echo out
objFile.Close
You can use:
/\bstarthere\s+([\s\S]*?)\s+endhere\b/
and grab the captured group #1
([\s\S]*?) will match any text between these 2 tags including newlines.

VBA Regex to match Two sets of Four Digits with String

Im trying to make a Excel Regex pattern to find a certain string. This what Im trying:
I'm trying to make it match 0 and 0000 to 9999
StringToMatch = "a75z6878"
Dim objRegExp As New RegExp
Set objRegExp = CreateObject("vbscript.regexp")
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[a-z]([0-9][0-9][0-9][0-9])[a-z]([0-9][0-9][0-9][0-9])"
objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
If objRegExp.Test(StringToMatch) Then MsgBox(Found!)
I have tried different patterns but none work.
What am I doing wrong???
What is wrong in objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
The quantifier must be specified as {m,n} and not {m-n}
change the regex to
[a-z][0-9]{1,4}[a-z][0-9]{1,4}
For example see the link http://regex101.com/r/wA2qM3/1
OR a shorter version like
[a-z]\d{1,4}[a-z]\d{1,4}

Regular Expression with a NOT match

I am trying to create a regular expression to validate file names for further processing.
I want to match a file with extension .txt, .csv, or .log
but I want to exclude them if it ENDS with _out.csv or mylog.log
Here is my start:
Function CanProcessFile(strFileNameIn, strLogName, strOutName)
Dim objRegEx, colMatches, strPattern
Set objRegEx = CreateObject("VBScript.RegExp")
strPattern = "((.txt$)|(.csv$)|(.Log$))"
strPattern = "(?!(" & strLogName & "$))" & strPattern
'strPattern = "(?!(" & strOutName & "$))" & strPattern
objRegEx.Pattern = strPattern
objRegEx.IgnoreCase = True
Set colMatches = objRegEx.Execute(strFileNameIn)
If colMatches.Count > 0 Then
CanProcessFile = True
Else
CanProcessFile = False
End If
End Function
but every time I try to add a ^ or ?! with (_out.csv$) it fails.
I think that creating two filters (as you suggest in a comment above) may well be the best way to go, but if you prefer to do it in a single regex, then you should be able to write:
objRegEx.Pattern = "^(?!.*(_out\.csv|mylog\.log)$).*\.(txt|csv|log)$"
which ensures that the file doesn't start with .*(_out\.csv|mylog\.log)$ (i.e., that it doesn't end with _out.csv or mylog.log).
I don't know if vbscript supports negative lookbehind, but if it does, have a try with:
(?<!_out|mylog)\.(?:txt|csv|log)$
May be vbscript accepts negative lookbehind but not variable length lookbehind, in that case, use:
(?<!_out)(?<!mylog)\.(?:txt|csv|log)$