Using Regex pattern to copy and rename Excel sheet using VBA - regex

I having trouble creating VBA to copy an existing sheet and rename the copy with a specific suffix.
The existing sheet is named with a variable prefix (a digit code) followed by a fix suffix.
The copied sheet should be renamed with the same prefix, followed by another fix suffix.
I would like to use regex to do so, but I cannot figure out how to specify the sheet names with regex. The pattern would simply be something like [0-9]+ for the prefix.
The suffix are always the same.
Example:
Existing sheet: 123_raw
New copied sheet: 123_analyzed
This is what I have so far and don't know how to go on:
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[0-9]+"
It should look something similar to this I guess:
Sheets("regex pattern + [suffix]").Select
Sheets("regex pattern + [suffix]").Copy After:=Sheets(3)
Sheets("regex pattern + [suffix] (2)").Select
Sheets("regex pattern + [suffix] (2)").Name = "regex pattern + [new suffix]"
But I have no idea on how to actually code it.
Any help much appreciated!

Assuming your Sheet names are 123_raw 456_raw Or something [3 digits_words] so your Pattern will be Pattern = "([0-9]{3}\_)" https://regex101.com/r/Iu6nxn/1
([0-9]{3}\_) Match a single character present in the list below
0-9 a single character in the range between 0 and 9
{3} Quantifier — Matches exactly 3 times
\_ matches the character _ literally (case sensitive)
VBA Code Example as simple as it can be - Here we are searching for the sheet name 123_ or [3 digits_words] Copy and rename to 3 digits_analyzed
Option Explicit
Public Sub Example()
Dim RegExp As Object
Dim Matches As Variant
Dim Pattern As String
Dim NewName As String
Dim Sht As Worksheet
Set RegExp = CreateObject("VbScript.RegExp")
For Each Sht In ThisWorkbook.Worksheets
Pattern = "([0-9]{3}\_)" ' Sheet name 123_
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Sht.Name)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0) ' Print on Immediate Win
NewName = Matches(0) & "analyzed" ' New sheet name
Sht.Copy After:=Sheets(3) ' Copy Sheet
ActiveSheet.Name = NewName ' Rename sheet with new name
End If
Next
Set RegExp = Nothing
Set Matches = Nothing
Set Sht = Nothing
End Sub

Something like this (where _new replaces the prior suffix)
Sub B()
Dim ws As Worksheet
Dim objRegex As Object
Set ws = Sheets(1)
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "([0-9]+)_[a-z]+"
If .test(ws.Name) Then
ws.Copy After:=Sheets(Sheets.Count)
ActiveSheet.Name = .Replace(ws.Name, "$1_new")
End If
End With
End Sub

Related

Extract ONLY THE FIRST match from Word doc using RegEx lunched from Excel VBA

There is a document like this one. I process 20 documents like this every day and they all look the same (structure, I mean, is very consistent).
The goal of this macro is to extract ONLY THE FIRST match of the RegEx pattern from the .ActiveDocument.Content. In the whole doc there is many more matches, but I need only the first one. The document being processed will be manually opened before the macro would run.
I'm just a VBA beginner so if there is a possibility to write it without using arrays, collections or some dictionaries I'd much appreciate. There is just one item to extract, so it's best to load it inside repNmbr string variable and from there just ws.Range("G30").Value = repNmbr. The simpler the better.
I used these resources Excel Regex Tutorial (Regular Expressions) which is very helpful but I still don't know how to load the FIRST MATCH alone into my repNmbr string variable. I'd like to do this without using any loop, because I just want to load a single string into this repNmbr variable.
Currently I have code like this:
Sub ExtractRepertor03()
'Application.ScreenUpdating = False
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String
'Assigning object variables
Set WordApp = GetObject(, "Word.Application") 'ActiveX can't create object is when
Set ExcelApp = GetObject(, "Excel.Application") 'there is no Word document open;
Set regEx = CreateObject("VBScript.RegExp")
Set WordDoc = WordApp.ActiveDocument
Set rng = WordApp.ActiveDocument.Content
'Create the regular expression object
regEx.Global = False 'because I need only the first match instead of all occurences;
regEx.IgnoreCase = True
regEx.Pattern = "([0-9]{1,5})([ ]{0,4})([/])([0-9]{4})"
'regEx.Pattern = "([0-9]{1,5})([\s]{0,4})(/[0-9]{4})"
repNmbr = regEx.Execute(rng.text) 'here is something wrong but I don't know what;
'I'm trying to assign the first RegEx match to repNmbr variable;
Debug.Print repNmbr
repNmbr = Replace(repNmbr, " ", "")
' Set matches = regEx.Execute(rng.text)
' Debug.Print regEx.Test(rng)
' 'Debug.Print regEx.Value
' For Each match In matches 'I just want this macro run without the loop
' Debug.Print match.Value 'Result: 9042 /2019
' repNmbr = match.Value
' Next match
ExcelApp.Application.Visible = True
ws.Range("G30").Value = repNmbr
End Sub
And an error like this:
Can someone explain to me why Set matches = regEx.Execute(rng.text) works fine but
repNmbr = regEx.Execute(rng.text) returns the error: "Wrong number of arguments or invalid property assignment"??
After regEx.Global = False is set, the RegEx finds only a single value, so why VBA refuses to assign this string into the repNmbr string variable??
As I said in your other question, you don't need the RegEx library for this. Stick to Word's wildcards! Try:
Sub Demo()
Application.ScreenUpdating = False
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
With WordApp.ActiveDocument.Range
With .Find
.Text = "<[0-9 ]{1,7}/[0-9]{4}>"
.MatchWildcards = True
.Wrap = wdFindStop
.Forward = True
.Execute
End With
If .Find.Found = True Then ActiveSheet.Range("G30").Value = Replace(.Text, " ", "")
End With
Application.ScreenUpdating = True
End Sub
Note: I haven't bothered with any of:
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String
as it's all superfluous - even your own code never assigns anything to ws.

Add a new line or space depending on pattern

I am trying to do the following.
Patterns:
aaaaa.BBBBB - to add New Line after the (.)
aaaaaBBBBB - to add New Line when see a Caps letter.
aaaaa12345 - to add a space when there is a digit (Output: aaaaa 12345)
12345aaaaa - to add a space when there is a letter after the digit (Output: 12345 aaaaa)
Values:
Client asked about the 21year planPlease follow up at1234567
ReGex code need to the following:
Client asked about the 21 (space) year plan**(new line)** Please
follow up at (space) 1234567
Result:
Client asked about the 21 year plan
Please follow up at 1234567.
How do I recognize the pattern and also do a specific replacement be it adding (space) or (newline)?
Here is the code I use currently:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z0-9])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
You can use two regex replacements. The first one to add spaces between 0a and a0 (between a number and a lowercase letter), and a second to create the newline between aA and a.A.
([a-z])([0-9])|([0-9])([a-z]) and replace with $1$3 $2$4
([a-z])\.?([A-Z]) and replace with $1\n$2
If you want a period added at the end use $ and replace with \.
Try this code:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
Dim result As String
With objRegex
.Global = True
.Pattern = "([a-z])([0-9])|([0-9])([a-z])"
result = .Replace(strIn, "$1$3 $2$4")
End With
With objRegex
.Global = True
.Pattern = "([a-z])\.?([A-Z])"
result = .Replace(result, "$1\n$2")
End With
SplitCaps = result
End Function

Create clean URL from text in Excel

I want to create a clean URL from a text such as this one:
Alpha Tests' Purchase of Berta Global Associates (C)
The URL should look like this:
alpha-tests-purchase-of-berta-global-associates-c
Currently I use this formula in Excel:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");" ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))
However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.
Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?
Thank you.
I can suggest the following Function that you can put into a VBA module and use a normal formula:
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"
With regEx
.Global = True
.Pattern = strPattern
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function
The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace.
UPDATE:
After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"
With regEx
.Global = True
.Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With
NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"
With regEx
.Global = True
.Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
.Global = True
.Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function

How to change case of matching letter with a VBA regex Replace?

I have a column of lists of codes like the following.
2.A.B, 1.C.D, A.21.C.D, 1.C.D.11.C.D
6.A.A.5.F.A, 2.B.C.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.B.C.x, 1.N.N.9.J.K
I want to find all instances of two single upper-case letters separated by a period, but only those that follow a number less than 6. I want to remove the period between the letters and convert the second letter to lower case. Desired output:
2.Ab, 1.Cd, A.21.C.D, 1.Cd.11.C.D
6.A.A.5.Fa, 2.Bc.H.1
8.ABC.B, A.B.C.D
12.E.A, 3.NO.T
A.3.Bc.x, 1.Nn.9.J.K
I have the following code in VBA.
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
c.Value = re.Replace("$1$2")
Next c
End Sub
This removes the period, but doesn't handle the lower-case requirement. I know in other flavors of regular expressions, I can use something like
re.Replace("$1\L$2\E")
but this does not have the desired effect in VBA. I tried googling for this functionality, but I wasn't able to find anything. Is there a way to do this with a simple re.Replace() statement in VBA?
If not, how would I go about achieving this otherwise? The pattern matching is complex enough that I don't even want to think about doing this without regular expressions.
[I have a solution I worked up, posted below, but I'm hoping someone can come up with something simpler.]
Here is a workaround that uses the properties of each individual regex match to make the VBA Replace() function replace only the text from the match and nothing else.
Sub fixBlah2()
Dim re As VBScript_RegExp_55.RegExp, Matches As VBScript_RegExp_55.MatchCollection
Dim M As VBScript_RegExp_55.Match
Dim tmpChr As String, pre As String, i As Integer
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
For Each c In Selection.Cells
'Count of number of replacements made. This is used to adjust M.FirstIndex
' so that it still matches correct substring even after substitutions.
i = 0
Set Matches = re.Execute(c.Value)
For Each M In Matches
tmpChr = LCase(M.SubMatches.Item(1))
If M.FirstIndex > 0 Then
pre = Left(c.Value, M.FirstIndex - i)
Else
pre = ""
End If
c.Value = pre & Replace(c.Value, M.Value, M.SubMatches.Item(0) & tmpChr, _
M.FirstIndex + 1 - i, 1)
i = i + 1
Next M
Next c
End Sub
For reasons I don't quite understand, if you specify a start index in Replace(), the output starts at that index as well, so the pre variable is used to capture the first part of the string that gets clipped off by the Replace function.
So this question is old, but I do have another workaround. I use a double regex so to speak, where the first engine looks for the match as an execute, then I loop through each of those items and replace with a lowercase version. For example:
Sub fixBlah()
Dim re As VBScript_RegExp_55.RegExp
dim ToReplace as Object
Set re = New VBScript_RegExp_55.RegExp
for each c in Selection.Cells
with re `enter code here`
.Global = True
.Pattern = "\b([1-5]\.[A-Z])\.([A-Z])\b"
Set ToReplace = .execute(C.Value)
end with
'This generates a list of items that match. Now to lowercase them and replace
Dim LcaseVersion as string
Dim ItemCt as integer
for itemct = 0 to ToReplace.count - 1
LcaseVersion = lcase(ToReplace.item(itemct))
with re `enter code here`
.Global = True
.Pattern = ToReplace.item(itemct) 'This looks for that specific item and replaces it with the lowercase version
c.value = .replace(C.Value, LCaseVersion)
end with
End Sub
I hope this helps!

Using RegEx to search a table for missing information doesn't extract all matching values

I am a little new to VBA, and I did try searching the forums for this topic but I am not sure I used the right words to search. Here is my question:
I am using VBA to extract missing information with regexp. Say I have a table with text which contains phone and fax numbers. I would like to collect the numbers into a table. So far, the code I have works OK, but when I have multiple numbers (say regular and 800 #s) for some reason, only one number is retrieved, not the others. How can I get all the results to be added to the table?
Query:
SELECT regxtr([Table1]![field1]) AS phone FROM Table1;
VBA code for (regxtr)function:
Option Compare Database
Function regxtr(ByVal Target As String) As String 'Target is the field we are 'extracting from
Dim re As New RegExp
Dim oMatches As Object
Dim oMatch As Object
Dim n As Long
n = 0
'Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.IgnoreCase = True
.Multiline = True
.Pattern = "(\d\d\d.\d\d\d\.\d\d\d\d)" 'keeping the pattern simple for now just to test
End With
'test before executing
If re.Test(Target) = True Then
Set oMatches = re.Execute(Target)
'attempt to get all matches. THIS IS WHERE I AM FAILING
For n = 0 To oMatches.Count - 1
Set oMatch = oMatches(n)
regxtr = oMatch.Value
n = n + 1 ' does this even belong here?
Next
End If
End Function
How can I get to so all matches will populate the field [phone] in the query? Any help would be greatly appreciated.
First of all, a correction in terminology. You're not looking for 'submatches' (also called 'capturing groups' in other regex implementations). You're looking for 'matches' for your regex, so you can drop the parentheses and just use \d{3}.\d{3}.\d{4} That said, this may be what you need:
Function regxtr(ByVal Target As String) As String 'Target is the field we are 'extracting from
Dim re As New RegExp
Dim oMatches As Object
Dim oMatch As Object
With re
.Global = True
.IgnoreCase = True
.Multiline = True
.Pattern = "\d{3}.\d{3}.\d{4}" 'keeping the pattern simple for now just to test
End With
If re.Test(Target) = True Then
Set oMatches = re.Execute(Target)
For Each oMatch In oMatches 'Note: you may get multiple matches because you set re.Global = True above, otherwise you would only get the first match
regxtr = regxtr & " " & oMatch 'Note: this is awkward and not advisable as a way to return the values. This is just an example.
Next oMatch
End If
End Function
As a test:
?regxtr("foo 993-242.1231bar994.425-1234hello987.234.2424 world 999.342-5252")
993-242.1231 994.425-1234 987.234.2424 999.342-5252