VBA Regex to match Two sets of Four Digits with String - regex

Im trying to make a Excel Regex pattern to find a certain string. This what Im trying:
I'm trying to make it match 0 and 0000 to 9999
StringToMatch = "a75z6878"
Dim objRegExp As New RegExp
Set objRegExp = CreateObject("vbscript.regexp")
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[a-z]([0-9][0-9][0-9][0-9])[a-z]([0-9][0-9][0-9][0-9])"
objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
If objRegExp.Test(StringToMatch) Then MsgBox(Found!)
I have tried different patterns but none work.
What am I doing wrong???

What is wrong in objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
The quantifier must be specified as {m,n} and not {m-n}
change the regex to
[a-z][0-9]{1,4}[a-z][0-9]{1,4}
For example see the link http://regex101.com/r/wA2qM3/1
OR a shorter version like
[a-z]\d{1,4}[a-z]\d{1,4}

Related

Using Regex pattern to copy and rename Excel sheet using VBA

I having trouble creating VBA to copy an existing sheet and rename the copy with a specific suffix.
The existing sheet is named with a variable prefix (a digit code) followed by a fix suffix.
The copied sheet should be renamed with the same prefix, followed by another fix suffix.
I would like to use regex to do so, but I cannot figure out how to specify the sheet names with regex. The pattern would simply be something like [0-9]+ for the prefix.
The suffix are always the same.
Example:
Existing sheet: 123_raw
New copied sheet: 123_analyzed
This is what I have so far and don't know how to go on:
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[0-9]+"
It should look something similar to this I guess:
Sheets("regex pattern + [suffix]").Select
Sheets("regex pattern + [suffix]").Copy After:=Sheets(3)
Sheets("regex pattern + [suffix] (2)").Select
Sheets("regex pattern + [suffix] (2)").Name = "regex pattern + [new suffix]"
But I have no idea on how to actually code it.
Any help much appreciated!
Assuming your Sheet names are 123_raw 456_raw Or something [3 digits_words] so your Pattern will be Pattern = "([0-9]{3}\_)" https://regex101.com/r/Iu6nxn/1
([0-9]{3}\_) Match a single character present in the list below
0-9 a single character in the range between 0 and 9
{3} Quantifier — Matches exactly 3 times
\_ matches the character _ literally (case sensitive)
VBA Code Example as simple as it can be - Here we are searching for the sheet name 123_ or [3 digits_words] Copy and rename to 3 digits_analyzed
Option Explicit
Public Sub Example()
Dim RegExp As Object
Dim Matches As Variant
Dim Pattern As String
Dim NewName As String
Dim Sht As Worksheet
Set RegExp = CreateObject("VbScript.RegExp")
For Each Sht In ThisWorkbook.Worksheets
Pattern = "([0-9]{3}\_)" ' Sheet name 123_
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Sht.Name)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0) ' Print on Immediate Win
NewName = Matches(0) & "analyzed" ' New sheet name
Sht.Copy After:=Sheets(3) ' Copy Sheet
ActiveSheet.Name = NewName ' Rename sheet with new name
End If
Next
Set RegExp = Nothing
Set Matches = Nothing
Set Sht = Nothing
End Sub
Something like this (where _new replaces the prior suffix)
Sub B()
Dim ws As Worksheet
Dim objRegex As Object
Set ws = Sheets(1)
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "([0-9]+)_[a-z]+"
If .test(ws.Name) Then
ws.Copy After:=Sheets(Sheets.Count)
ActiveSheet.Name = .Replace(ws.Name, "$1_new")
End If
End With
End Sub

RegExp To Search For Array Of File Extensions

I am using this syntax to search for one file extension, how can I alter it to search for 2?
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.pattern = ".mdb"
objRegExp.IgnoreCase = True
You can use a pipe character | to split two possible match values. For example, if you want to match against an Access database and an Excel spreadsheet, you would use this:
Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.pattern = "\.mdb$|\.xls$"
objRegExp.IgnoreCase = True
The $ indicates that the pattern has to be found at the end of the string being checked. The . will also be treated as a special character so you need to escape it, using a \ backslash.

Vbscript Regular expression, find all between 2 known strings

I would like to select all texts between two know strings. Say for example the following text
*starthere
*General Settings
* some text1
* some text2
*endhere
I would like to select all texts between "*starthere" and "*endhere" using vbscript. so that the final output looks like the following
*General Settings
* some text1
* some text2
I know this would be simpler using a regex since there are multiple instances of such pattern in the file i read.
I tried something like the following
/(.*starthere\s+)(.*)(\s+*endhere.*)/
/(*starthere)(?:[^])*?(*endhere)/
But they dont seem to work and it selects even the start and end strings together. Lookforward and backword dont seem to work either and iam not sure if they have support for vbscript.
This is the code I am using:
'Create a regular expression object
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "/\*starthere\s+([\s\S]*?)\s+\*endhere\b/" objRegExp.IgnoreCase = True
objRegExp.Global = True
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Dim objMatches
Set objMatches = objRegExp.Execute(strSearchString)
If objMatches.Count > 0 Then
out = out & objMatches(0) &vbCrLf
WScript.Echo "found"
End If
Loop
WScript.Echo out
objFile.Close
You can use:
/\bstarthere\s+([\s\S]*?)\s+endhere\b/
and grab the captured group #1
([\s\S]*?) will match any text between these 2 tags including newlines.

How to change case of matching letter with a VBA regex Replace?

I have a column of lists of codes like the following.
2.A.B, 1.C.D, A.21.C.D, 1.C.D.11.C.D
6.A.A.5.F.A, 2.B.C.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.B.C.x, 1.N.N.9.J.K
I want to find all instances of two single upper-case letters separated by a period, but only those that follow a number less than 6. I want to remove the period between the letters and convert the second letter to lower case. Desired output:
2.Ab, 1.Cd, A.21.C.D, 1.Cd.11.C.D
6.A.A.5.Fa, 2.Bc.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.Bc.x, 1.Nn.9.J.K
I have the following code in VBA.
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
c.Value = re.Replace("$1$2")
Next c
End Sub
This removes the period, but doesn't handle the lower-case requirement. I know in other flavors of regular expressions, I can use something like
re.Replace("$1\L$2\E")
but this does not have the desired effect in VBA. I tried googling for this functionality, but I wasn't able to find anything. Is there a way to do this with a simple re.Replace() statement in VBA?
If not, how would I go about achieving this otherwise? The pattern matching is complex enough that I don't even want to think about doing this without regular expressions.
[I have a solution I worked up, posted below, but I'm hoping someone can come up with something simpler.]
Here is a workaround that uses the properties of each individual regex match to make the VBA Replace() function replace only the text from the match and nothing else.
Sub fixBlah2()
Dim re As VBScript_RegExp_55.RegExp, Matches As VBScript_RegExp_55.MatchCollection
Dim M As VBScript_RegExp_55.Match
Dim tmpChr As String, pre As String, i As Integer
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
'Count of number of replacements made. This is used to adjust M.FirstIndex
' so that it still matches correct substring even after substitutions.
i = 0
Set Matches = re.Execute(c.Value)
For Each M In Matches
tmpChr = LCase(M.SubMatches.Item(1))
If M.FirstIndex > 0 Then
pre = Left(c.Value, M.FirstIndex - i)
Else
pre = ""
End If
c.Value = pre & Replace(c.Value, M.Value, M.SubMatches.Item(0) & tmpChr, _
M.FirstIndex + 1 - i, 1)
i = i + 1
Next M
Next c
End Sub
For reasons I don't quite understand, if you specify a start index in Replace(), the output starts at that index as well, so the pre variable is used to capture the first part of the string that gets clipped off by the Replace function.
So this question is old, but I do have another workaround. I use a double regex so to speak, where the first engine looks for the match as an execute, then I loop through each of those items and replace with a lowercase version. For example:
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
dim ToReplace as Object
Set re = New VBScript_RegExp_55.RegExp
for each c in Selection.Cells
with re `enter code here`
.Global = True
.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
Set ToReplace = .execute(C.Value)
end with
'This generates a list of items that match. Now to lowercase them and replace
Dim LcaseVersion as string
Dim ItemCt as integer
for itemct = 0 to ToReplace.count - 1
LcaseVersion = lcase(ToReplace.item(itemct))
with re `enter code here`
.Global = True
.Pattern = ToReplace.item(itemct) 'This looks for that specific item and replaces it with the lowercase version
c.value = .replace(C.Value, LCaseVersion)
end with
End Sub
I hope this helps!

Regular Expression with a NOT match

I am trying to create a regular expression to validate file names for further processing.
I want to match a file with extension .txt, .csv, or .log
but I want to exclude them if it ENDS with _out.csv or mylog.log
Here is my start:
Function CanProcessFile(strFileNameIn, strLogName, strOutName)
Dim objRegEx, colMatches, strPattern
Set objRegEx = CreateObject("VBScript.RegExp")
strPattern = "((.txt$)|(.csv$)|(.Log$))"
strPattern = "(?!(" & strLogName & "$))" & strPattern
'strPattern = "(?!(" & strOutName & "$))" & strPattern
objRegEx.Pattern = strPattern
objRegEx.IgnoreCase = True
Set colMatches = objRegEx.Execute(strFileNameIn)
If colMatches.Count > 0 Then
CanProcessFile = True
Else
CanProcessFile = False
End If
End Function
but every time I try to add a ^ or ?! with (_out.csv$) it fails.
I think that creating two filters (as you suggest in a comment above) may well be the best way to go, but if you prefer to do it in a single regex, then you should be able to write:
objRegEx.Pattern = "^(?!.*(_out\.csv|mylog\.log)$).*\.(txt|csv|log)$"
which ensures that the file doesn't start with .*(_out\.csv|mylog\.log)$ (i.e., that it doesn't end with _out.csv or mylog.log).
I don't know if vbscript supports negative lookbehind, but if it does, have a try with:
(?<!_out|mylog)\.(?:txt|csv|log)$
May be vbscript accepts negative lookbehind but not variable length lookbehind, in that case, use:
(?<!_out)(?<!mylog)\.(?:txt|csv|log)$