UDF (Regular expression) to match a string variants with some exclusions

UDF (Regular expression) to match a string variants with some exclusions - regex

I need to use (Regular expression) on the string Mod* followed by a specific one character e.g. "A" , like:
Mod A , Mod_A , Module xx A , Modules (A & B) and so on.
But, with the following conditions:
(1)- if the cell contains any of (Modif* or Moder* or Modr*) and Mod* Plus my specific character then the result is True
(2)- if the cell contains any of (Modif* or Moder* or Modr*) and not Mod* Plus my specific character then the result is False
Please this example and the expected result:
Item Description
Expected Result of RegexMatch
new modified of module A 1
TRUE
new modification of mod A
TRUE
new moderate of mod_A
TRUE
to modules (A & B)
TRUE
new modified and moderate A 1
FALSE
new modification of  A
FALSE
new moderate of modify
FALSE
to modules (D & E)
FALSE
Public Function RegexMatch(str) As Boolean
Dim tbx2 As String: tbx2 = "A" 'ActiveSheet.TextBox2.Value
Static re As New RegExp
re.Pattern = "\b[M]od(?!erate).*\b[" & tbx2 & "]\b"
re.IgnoreCase = True
RegexMatch = re.Test(str)
End Function
In advance, great thanks for your kindly help.

Not sure if I understand your requirements correctly: You want rows that contain a word that starts with "mod", but words starting with "Modif" or "Moder" or "Modr" doesn't count. Additionally, a module character (eg "A") needs to be present.
I usually get dizzy when I see longer regex terms, so I try to program some lines of code instead. The following function replaces special characters like "(" or "_" with blanks, splits the string into words and check the content word by word. Easy to understand, easy to adapt:
Function CheckModul(s As String, modulChar As String) As Boolean
Dim words() As String
words = Split(replaceSpecialChars(s), " ")
Dim i As Long, hasModul As Boolean, hasModulChar As Boolean
For i = 0 To UBound(words)
Dim word As String
word = UCase(words(i))
If word Like "MOD*" _
And Not word Like "MODIF*" _
And Not word Like "MODER*" _
And Not word Like "MODR*" Then
hasModul = True
End If
If word = modulChar Then
hasModulChar = True
End If
Next
CheckModul = hasModul And hasModulChar
End Function
Function replaceSpecialChars(ByVal s As String) As String
Dim i As Long
replaceSpecialChars = s
For i = 1 To Len(replaceSpecialChars)
If Mid(replaceSpecialChars, i, 1) Like "[!0-9A-Za-z]" Then Mid(replaceSpecialChars, i) = " "
Next
End Function
Tested as UDF with your data:

Related

Can I use regex to not only identify a pattern but extract the value found?

I am using Excel VBA.
I need to extract the dimensions (width x height) of a creative from a string and the dimensions will always be in the format:
000x000 or 000X000 or 000x00 or 000X00 where 0 can be any number between 1-9 and x can be upper or lower case.
I read this guide:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
And I think what I want is something similar to:
[0-9]{2, 3}[xX][0-9]{2, 3}
So if my string is:
creativeStr = ab234-cdc-234-300x250-777aabb
I want to extract "300x250" and assign it to a variable like this:
dimensions = 300x250
Is my Regex above correct? Also, how would I pull the resulting match into a variable?
Here is part of my code:
creativeStr = "Sample-abc-300x250-cba-123"
regex_pattern = "[0-9]{2,3}[xX][0-9]{2,4}"
If regex_pattern <> "" Then
With regEx
.Global = True
.Pattern = regex_pattern
End With
If regEx.Test(creativeStr) Then
dimensions = regEx.Replace(creativeStr, "$1")
Else
dimensions = "Couldn't extract dimensions from creative name."
End If
End If
But it still returns the condition in my else clause...
Thanks!

Your examples do not match your regex. Your examples show that the first set of digits will always be three, and the last set either two or three.
Also, in your description you write can be any number between 1-9 but your example includes 0's.
If you are going to work with regex, that type of imprecision will lead to undesired results.
Asssuming that 0's should be included, and that the desired pattern is 3x2 or 3x3, then perhaps this example will provide some clarity:
Option Explicit
Function dimension(S As String) As String
Dim RE As Object, MC As Object
Const sPat As String = "[0-9]{3}[Xx][0-9]{2,3}"
' or, with .ignorecase = true, could use: "\d{3}x\d{2,3}"
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = sPat
If .Test(S) = True Then
Set MC = .Execute(S)
dimension = MC(0)
Else
dimension = "Couldn't extract dimensions from creative name."
End If
End With
End Function
Sub getDimension()
Dim creativeStr As String
Dim Dimensions As String
creativeStr = "Sample-abc-300x250-cba-123"
Dimensions = dimension(creativeStr)
Debug.Print Dimensions
End Sub

Slight adaptation of a User Defined Function

I would like to extract a combination of text and numbers from a larger string located within a column within excel.
The constants I have to work with is that each Text string will
•either start with a A, C or S, and
•will always be 7 Characters long
•the position of he string I would like to extract varies
The code I have been using which has been working efficiently is;
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(r.Text, " ")
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
However today I have learnt that sometimes my data may include scenarios like this;
What I would like is to adapt my code so If the 8th character is "Underscore" and the 1st character of the 7 characters is either S, A or C please extract up until the "Underscore"
Secondly I would like to exclude commons words like "Support" & "Collect" from being extracted.
Finally the 7th letter should be a number
Any ideas around this would be much appreciated.
Thanks

try this
ary = Split(Replace(r.Text, "_", " "))
or
ary = Split(Replace(r.Text, "_", " ")," ")
result will be same for both variants
test
update
Do you know how I could leave the result blank if the 7th character returned a letter?
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(Replace(r.Text, "_", " "))
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" And IsNumeric(Mid(a, 7, 1)) Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
test

Add Microsoft VBScript Regular Expressions 5.5 to project references. Use the following code to test matching and extracting with Xtractor:
Public Function Xtractor(ByVal p_val As String) As String
Xtractor = ""
Dim ary As String, v_re As New VBScript_RegExp_55.RegExp, Matches
v_re.Pattern = "^([SAC][^_]{1,6})_?"
Set Matches = v_re.Execute(p_val)
If Matches.Count > 0 Then Xtractor = Matches(0).SubMatches(0) Else Xtractor = ""
End Function
Sub test_Xtractor(p_cur As Range, p_val As String, p_expected As String)
Dim v_cur As Range, v_res As Range
p_cur.Value = p_val
Set v_cur = p_cur.Offset(columnOffset:=1)
v_cur.FormulaR1C1 = "='" & ThisWorkbook.Name & "'!Xtractor(RC[-1])"
Set v_res = v_cur.Offset(columnOffset:=1)
v_res.FormulaR1C1 = "=RC[-1]=""" & p_expected & """"
Debug.Print p_val; "->"; v_cur.Value; ":"; v_res.Value
End Sub
Sub test()
test_Xtractor ActiveCell, "A612002_MDC_308", "A612002"
test_Xtractor ActiveCell.Offset(1), "B612002_MDC_308", ""
test_Xtractor ActiveCell.Offset(2), "SUTP038_MDC_3", "SUTP038"
test_Xtractor ActiveCell.Offset(3), "KUTP038_MDC_3", ""
End Sub
Choose the workbook and cell for writing test fixture, then run test from the VBA Editor.
Output in the Immediate window (Ctrl+G):
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
UPD
Isit possible to ammend this code so if the 7th character is a letter to return blank?
Replace line with assign to v_re by the following:
v_re.Pattern = "^([SAC](?![^_]{5}[A-Z]_?)[^_]{1,6})_?"
v_re.IgnoreCase = True
And add to the test suite:
test_Xtractor ActiveCell.Offset(4), "SUTP03A_MDC_3", ""
Output:
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
SUTP03A_MDC_3->:True
I inserted negative lookahead subrule (?![^_]{5}[A-Z]_?) to reject SUTP03A_MDC_3. But pay attention: the rejecting rule is applied exactly to the 7th character. Now v_re.IgnoreCase set to True, but if only capitalized characters are allowed, set it to False. See also Regular Expression Syntax on MSDN.

Searching multiple .txt files for all occurrences of a string?

I am trying to create a tool that will search 300+ .txt files for a string that that may be used several times in each of the 300+ .txt files
I want to be able to go through each file and get the string between each of the occurrences.
It sounds a bit twisted I know, I have been scratching my head for hours, while testing code.
What I have tried
I read through each file and check for if it contains my search text at least once, if it does, then I add the full path of the (files that do contain it) to a list
Dim FileNamesList As New List(Of String)
Dim occurList As New List(Of String)
Dim textSearch As String = TextBox1.Text.ToLower
'check each file to see if it even contains textbox1.text
'if it does, then add matching files to list
For Each f As FileInfo In dir.GetFiles("*.txt")
Dim tmpRead = File.ReadAllText(f.FullName).ToLower
Dim tIndex As Integer = tmpRead.IndexOf(textSearch)
If tIndex > -1 Then
FileNamesList.Add(f.FullName)
End If
Next
Then I thought, oh, now all I need to do is go through each string in that 'approved' files list and add the entire contents of each to a new list.
Then I go through each in 'that' list and get string between two delimiters.
And... I just get lost from there...
Here is the get string between delimiters I have tried using.
Private Function GetStringBetweenTags(ByVal startIdentifer As String, ByVal endIndentifier As String, ByVal textsource As String) As String
Dim idLength As Int16 = startIdentifer.Length
Dim s As String = textsource
Try
s = s.Substring(s.IndexOf(startIdentifer) + idLength)
s = s.Substring(0, s.IndexOf(endIndentifier))
'MsgBox(s)
Catch
End Try
Return s
End Function
In simple terms...
I have 300 .txt files
Some may contain a string that I am after
I want the substring of each string
Normally I am fine, and never need to ask questions, but there is too many forceptions going on.
Logical Example
== Table.txt ==
print("I am tony")
print("pineapple")
print("brown cows")
log("cable ties")
log("bad ocd")
log("bingo")
== Cherry.txt ==
print("grapes")
print("pie")
print("apples")
log("laugh")
log("tuna")
log("gonuts")
== Tower.txt ==
print("tall")
print("clouds")
print("nomountain")
log("goggles?")
log("kuwait")
log("india")
I want to end with list of the text between only the print function from all 3 files
Haven't found any other thread about this, probably because it stupid.
So I should end with
== ResultList ==
I am tony
pineapple
brown cows
grapes
pie
apples
tall
clouds
nomountain

RegEx is probably your best choice for something like this. For instance:
Dim results As New List(Of String)()
Dim r As New RegEx("print\(""(.*)""\)")
For path As String In filePaths
Dim contents As String = File.ReadAllText(path)
For Each m As Match in r.Matches(contents)
If m.Sucess Then
results.Add(m.Groups(1).Value)
End If
Next
Next
As you can see, the code loops through a list of file paths. For each one, it loads the entire contents of the file into a string. It then searches the file contents string for all matches to the following regular expression pattern: print\("(.*)"\). It then loops through all of those pattern matches and grabs the value of the first capture group from each one. Those are added to the results list, which contains your desired strings. Here's the meaning of the parts of the RegEx:
print - Looks for any string starting with the word "print"
\( - The next character after the word "print" must be an open parentheses (the backslash is an escape character)
" - The next character after the open parentheses must be a double quote character (it is repeated twice so as to escape it so that VB doesn't think it's the end of the string).
(.*) - The parentheses define this as a capturing group (so that we can pull out just this value from the matches). The .* means any characters of any length.
"\) - Matching strings must end with a double quote followed by a closing parentheses.

Use Regex:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim input1 As String = _
"print(""I am tony"") " + _
"print(""pineapple"") " + _
"print(""brown cows"") " + _
"log(""cable ties"") " + _
"log(""bad ocd"") " + _
"log(""bingo"")"
Dim input2 As String = _
"print(""grapes"") " + _
"print(""pie"") " + _
"print(""apples"") " + _
"log(""laugh"") " + _
"log(""tuna"") " + _
"log(""gonuts"")"
Dim input3 As String = _
"print(""tall"") " + _
"print(""clouds"") " + _
"print(""nomountain"") " + _
"log(""goggles?"") " + _
"log(""kuwait"") " + _
"log(""india"")"
Dim pattern As String = "print\(""([^""]*)""\)"
Dim expr As Regex = New Regex(pattern, RegexOptions.Singleline)
Dim matches As MatchCollection = Nothing
Dim data As List(Of String) = New List(Of String)()
matches = expr.Matches(input1)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input2)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input3)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
End Sub
End Module

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function

Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.

Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function

Is this the RegEx for matching any cell reference in an Excel formula?

I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.
I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:
Formulas are preceded with an equal sign "="
Strings within formulas are enclosed in double quotes and need to be removed before looking for real references, otherwise =A1&"A1" would break regex
Worksheet names can be up to 31 characters long, excluding \ / ? * [ ] :
Worksheet names in external references must be succeeded with bang =Sheet1!A1
Workbook names in external references must be enclosed in square brackets =[Book1.xlsx]Sheet1!A1
Workbook paths, which Excel adds if a reference is to a range in a closed workbook, are always enclosed in single quotes and to the left of the brackets for the workbook name 'C:\[Book1.xlsx]Sheet1'!A1
Some characters (non-breaking space, for example) cause Excel to enclose the workbook and worksheet name in an external reference in single quotes, but I don't know specifically which characters ='[Book 1.xlsx]Sheet 1'!A1
Even if R1C1-notation is enabled, Range.Formula still returns references in A1-notation. Range.FormulaR1C1 returns references in R1C1 notation.
3D reference style allows a range of sheet names on one workbook =SUM([Book5]Sheet1:Sheet3!A1)
Named ranges can be specified in formulas:
The first character of a name must be a letter, an underscore character (_), or a backslash (\). Remaining characters in the name can be letters, numbers, periods, and underscore characters.
You cannot use the uppercase and lowercase characters "C", "c", "R", or "r" as a defined name, because they are all used as a shorthand for selecting a row or column for the currently selected cell when you enter them in a Name or Go To text box.
Names cannot be the same as a cell reference, such as Z$100 or R1C1.
Spaces are not allowed as part of a name.
A name can be up to 255 characters in length.
Names can contain uppercase and lowercase letters. Excel does not distinguish between uppercase and lowercase characters in names.
Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:
Sub ReturnFormulaReferences()
Dim objRegExp As New VBScript_RegExp_55.RegExp
Dim objCell As Range
Dim objStringMatches As Object
Dim objReferenceMatches As Object
Dim objMatch As Object
Dim intReferenceCount As Integer
Dim intIndex As Integer
Dim booIsReference As Boolean
Dim objName As Name
Dim booNameFound As Boolean
With objRegExp
.MultiLine = True
.Global = True
.IgnoreCase = True
End With
For Each objCell In Selection.Cells
If Left(objCell.Formula, 1) = "=" Then
objRegExp.Pattern = "\"".*\"""
Set objStringMatches = objRegExp.Execute(objCell.Formula)
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Set objReferenceMatches = objRegExp.Execute(objCell.Formula)
intReferenceCount = 0
For Each objMatch In objReferenceMatches
intReferenceCount = intReferenceCount + 1
Next
Debug.Print objCell.Formula
For intIndex = intReferenceCount - 1 To 0 Step -1
booIsReference = True
For Each objMatch In objStringMatches
If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
booIsReference = False
Exit For
End If
Next
If booIsReference Then
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7})"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
& "[a-z_\\][a-z0-9_\.]{0,254}$"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
booNameFound = False
For Each objName In objCell.Worksheet.Parent.Names
If objReferenceMatches(intIndex).Value = objName.Name Then
booNameFound = True
Exit For
End If
Next
If Not booNameFound Then
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
For Each objName In objCell.Worksheet.Names
If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
booNameFound = True
Exit For
End If
Next
End If
booIsReference = booNameFound
End If
End If
End If
If booIsReference Then
Debug.Print " " & objReferenceMatches(intIndex).Value _
& " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
& objReferenceMatches(intIndex).Length & ")"
End If
Next intIndex
Debug.Print
End If
Next
Set objRegExp = Nothing
Set objStringMatches = Nothing
Set objReferenceMatches = Nothing
Set objMatch = Nothing
Set objCell = Nothing
Set objName = Nothing
End Sub
Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.
Thanks!

jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.
stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula
Thanks jtolle!

I'm a few years late here, but I was looking for something similar and so dug into this. The main pattern you use is this:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Basically you have six alternatives for a range reference (lines 3-8), any of which will produce a match by itself, with two alternatives for an optional filename/sheet name prefix (lines 1-2).
For the two prefix alternatives, the only difference is that the first is wrapped in single quotes, with an extra dot star after the initial quote. These single quotes occur mainly when there is a space in a sheet name. The purpose of the dot star, matching unconstrained text after an initial single quote, is unclear and it appears to create problems. I'll discuss those problems below. Besides that the two alternative prefixes are the same, and I'll refer to them collectively as the Optional External Prefix (OEP).
The OEP has its own two optional prefixes (the same in either alternative). The first is for the workbook name, an open-ended dot star in brackets.
(\[.*\])?
The second is for a "3D" cell reference, with two sheet names separated by a colon; it is the initial sheet name including the colon. The pattern here is a negated character class allowing up to 31 characters of anything except forward slash, back slash, question mark, asterisk, brackets, or colon, followed by a colon:
([^\:\\\/\?\*\[\]]{1,31}\:)?
Finally for the OEP is its only required part: a sheet name, same as the optional sheet name but with no colon. The effect is (if these all worked correctly) that the required sheet name will match if it can, and then only if there is a 3d reference or additional prior bracketed text will its optional prefixes also match.
Issues with the Workbook/Sheet name prefix: First, the dot star at the beginning of the first line is over-inclusive. Similarly, the negated character class for the sheet name appears to need additional characters including parens, comma, plus, minus, equals, and bang. Otherwise, extra material is interpreted as part of the sheet name. On my testing, this overinclusion happened with any of these:
=SUM(Sheet1!A1,Sheet2!A2)
=Sheet1!A1+Sheet2!A2
=Sheet1!A1-Sheet2!A2
Sheet names can include some of these characters, so accounting for that would require some additional measure. For instance, a sheet could be named "(Sheet1)", giving an odd formula like:
=SUM('(Sheet1)'!A1:A2)
You'd like to get the inner parens with the sheet name there, but not the outer paren. Excel puts the single quotes on that one, as it would with a space in the sheet name. You could then exclude parens in the non-single quote version since within the single quote it's ok. But then beware Excel seems to even allow single quotes in sheet names. Taking these naming quirks to the extreme, I just successfully named a sheet "Hi'Sheet1'SUM('Sheet2'!A1,A2)!". That's absurd but it points to what could happen. I learned in doing this that if I include a single quote in a sheet name, formulas escape the single quote with a second single quote. So a SUM(A1:A2) referring to the sheet I just created ends up looking like this:
=SUM('Hi''Sheet1''SUM(''Sheet2''!A1,A2)!'!A1:A2)
That actually does give some insight into the Excel parser itself. I suspect to adequately deal with this you may want separately (outside the regex) to compare the potential sheet names or workbook names to the actual sheet names, as you have done with the named ranges.
This leads to the six forms of cell references allowed in the regex (any one of which, if met, will produce a match):
1.) A one-cell or multi-cell range with rows and columns
"(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?"
The open paren here is closed at the end of the 6 options. Otherwise, this line allows a basic cell reference of the type "$A$1", "A1", "$A1", "A$1", or any combination of these in a multi-cell range ("$A1:A$2", etc.).
2.) A full-column or multi-column range with absolute references only
"|\$[a-z]{1,3}\:\$[a-z]{1,3}"
This one allows a cell reference of the type "$A:$B" with a dollar sign on both. Note a dollar sign on only one side will not match.
3.) A full-column or multi-column range with relative references only
"|[a-z]{1,3}\:[a-z]{1,3}"
This line is like the last, but matches only with no dollar signs. Note a dollar sign on only one side will not match here either.
4.) A full-row or multi-row range with absolute references only
"|\$[0-9]{1,7}\:\$[0-9]{1,7}"
This line allows a cell reference of the type "$1:$2" with a dollar sign on both.
5.) A full-row or multi-row range with relative references only
"|[0-9]{1,7}\:[0-9]{1,7}"
This version is like the last, but matches only with no dollar signs.
6.) Other text that could be a named range
"|[a-z_\\][a-z0-9_\.]{0,254})"
Finally, the sixth option allows text. This text is compared to actual named ranges later in sub.
The main omission that I see here is ranges that have both absolute and relative references, of the type "A:$A" or "1:$1". While $A:A is captured because it includes "A:A", "A:$A" is not captured. You could address this and simplify the regex by combining 2 and 3 and combining 4 and 5 with optional dollar signs:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$?[a-z]{1,3}\:\$?[a-z]{1,3}" _
& "|\$?[0-9]{1,7}\:\$?[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Combining these further would seem to come up against the everything-is-optional problem.
One other issue is in the initial regex pattern for matching strings, which you use to expunge potential ranges that fall inside a quoted string:
objRegExp.Pattern = "\"".*\"""
When I test this on a formula with a string at the beginning and end of a formula, the greediness of the dot star captures everything from the initial quote to the final quote (in other words it interprets the entire formula as one big quoted string, even though there is non-string material in the middle). It appears you can fix this by making the dot star lazy (adding a question mark after it). That raises questions about quotes within quotes, but they may not be a problem. For instance, I tested this formula:
="John loves his A1 steak sauce, but said the ""good A1 steak sauce price"" is $" & A2+A3 & " less than the ""bad price"" of $" & A4 & "."
With cell values plugged in, this formula evaluates to:
John loves his A1 steak sauce, but said the "good A1 steak sauce
price" is $5 less than the "bad price" of $8.
With the lazy modifier added to your string pattern, both versions of "A1" above were recognized as occurring within a string and so were expunged, while A2, A3 and A4 were recognized as cell references.
I'm sure there are some technical issues with some of my language above, but hopefully the analysis is still useful.

Thanks Ben (I'm new to post here, even though Stackoverflow has caught my attention for years for high quality technical stuff, so I'm not sure if I read this page correctly for the author J)
I tried the posted solutions (testing, testing updated, as well as the one using range.precendents (which as correctly pointed, does not cover references to other sheets or other workbooks) and found a minor flaw: the external sheet name is enclosed in 'single quotation marks' only if it is a number; if it contains space (and possibly other characters as Ben (?) listed in the orginal post. with a simple addition to the regEx (opening [) this can be corrected (added "[", see code below). In addition, for my own purpose I converted the sub to a function that will return a comma-separated list with duplicates removed (note, this removes just identical reference notation, not cells that are included in multiple ranges):
Public Function CellReflist(Optional r As Range) ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2) ' INPUT THE CELL HERE , e.g. RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address
objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then CellReflist = result(0).Value
If result.Count > 1 Then
For i = 1 To result.Count - 1 'Each Match In result
dbl = False ' poistetaan tuplaesiintymiset
For j = 0 To i - 1
If result(i).Value = result(j).Value Then dbl = True
Next j
If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
Next i 'Match
End If
End If
End Function

I resolved a similar problem in Google Sheets.
The following adds/subtract row references from a formula. Because I just needed to update row references, rather than extracting the formula I just extracted and updated the row reference with this /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/
String.prototype.replaceAt = function(index, replacement, diff = 0) {
let end = this.substr(index + replacement.length + diff)
if((this.length - 1) === index) end = ""
return this.substr(0, index) + replacement + end;
}
// Ref: https://stackoverflow.com/a/1431113/2319414
/**
* #param row - positive integer to add, negative to subtract rows.
*/
function updateRowReference(formula, row){
let masked = formula
const mask = "#"
// masking double quotes in string literals
let exp = /""/g
let result;
while((result = exp.exec(masked)) !== null){
masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
}
// masking string literals
exp = /\"([^\\\"]|\\.)*\"/g
// Ref: https://stackoverflow.com/a/9260547
while((result = exp.exec(masked)) !== null){
masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
}
// updating row references
const sRow = row.toString()
// The magic is happening here
// Just matching a number which is part of range address
exp = /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/g
while((result = exp.exec(masked)) !== null){
const oldRow = Number(result[0])
// adding/subtracting rows
const newRow = (row + oldRow).toString()
// preserving formula string length integrity if number of digits of new row is different than old row
const diff = result[0].length - newRow.length
masked = masked.replaceAt(result.index, newRow, diff)
formula = formula.replaceAt(result.index, newRow, diff)
exp.lastIndex -= diff
}
let updated = masked;
// revert mask
const array = formula.split("")
while((result = updated.search(mask)) !== -1){
updated = updated.replaceAt(result, array[result])
}
return updated
}
function test(){
const cases = [
"=$A$1",
"=A1",
"=$A1",
"=A$1",
"=$A1:B$1",
"=1:1",
"=Sheet1!1:1",
"=Sheet1!$A1:B$1",
"=Sheet1!A$1",
'=IF(AND($C6 <> ""; NOT(ISBLANK(B$6))); IF(SUM(FILTER($F$6:$F$7;$C$6:$C$7 = $C6)) < $G6; 1; IF($E6 = 0; 1; 0)); 0)',
"=$A$111", "=A111", "=$A111", "=A$111", "=$A111:B$111",
"=111:111",
"=Sheet1!111:111",
"=Sheet1!$A111:B$111",
"=Sheet1!A$111",
'=IF(AND($C111 <> ""; NOT(ISBLANK(B$111))); IF(SUM(FILTER($F$111:$F$112;$C$111:$C$112 = $C111)) < $G111; 1; IF($E111 = 0; 1; 0)); 0)',
// if string literals have addresses they shouldn't be affected
'=IF(AND($C111 <> "A1 $A1 $A1:B$1";$C111 <> "Sheet1!1:1";$C111 <> "Sheet1!$A1:B$1"); 1 , 0)'
]
const expectedAdd = [
'=$A$16',
'=A16',
'=$A16',
'=A$16',
'=$A16:B$16',
'=16:16',
'=Sheet1!16:16',
'=Sheet1!$A16:B$16',
'=Sheet1!A$16',
'=IF(AND($C21 <> ""; NOT(ISBLANK(B$21))); IF(SUM(FILTER($F$21:$F$22;$C$21:$C$22 = $C21)) < $G21; 1; IF($E21 = 0; 1; 0)); 0)',
'=$A$126',
'=A126',
'=$A126',
'=A$126',
'=$A126:B$126',
'=126:126',
'=Sheet1!126:126',
'=Sheet1!$A126:B$126',
'=Sheet1!A$126',
'=IF(AND($C126 <> ""; NOT(ISBLANK(B$126))); IF(SUM(FILTER($F$126:$F$127;$C$126:$C$127 = $C126)) < $G126; 1; IF($E126 = 0; 1; 0)); 0)',
'=IF(AND($C126 <> "A1 $A1 $A1:B$1";$C126 <> "Sheet1!1:1";$C126 <> "Sheet1!$A1:B$1"); 1 , 0)'
]
let results = cases.map(_case => updateRowReference(_case, 15))
console.log('Test Add')
console.log(results.every((result, i) => result === expectedAdd[i]))
console.log('Test Subtract')
results = results.map(_case => updateRowReference(_case, -15))
console.log(results.every((result, i) => result === cases[i]))
}
test()
'INDIRECT' function with addresses as strings will not be updated

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

UDF (Regular expression) to match a string variants with some exclusions - regex

Related

Can I use regex to not only identify a pattern but extract the value found?

Slight adaptation of a User Defined Function

Searching multiple .txt files for all occurrences of a string?

VBA: REGEX LOOKBEHIND MS ACCESS 2010

Is this the RegEx for matching any cell reference in an Excel formula?

Categories

Resources