Removing unwanted characters from in-front and behind cells in VBA (excel) - regex

I’m very new to programming and although there are several similar questions to mine that have been asked, I can't seem to get them working for my needs.
What I want is to be able to copy raw data into column A, hit run on the macro and it should remove any unwanted characters both before and after the data that I want to keep resulting in a cell just containing the data that I want. I also want it to go through all cells that are in the column, bearing in mind some cells may be empty.
The data that I want to keep is in this format:
L1-somedata-0000
The -somedata- text will change but the - ether side will always be there, the L1 will sometimes be L2, and the 0000 (which could be any 4 numbers) will sometimes be any 3 numbers. also there may be some rows in the column that have no useful data, these should be removed. Finally, some cells will not contain any unwanted data, these should stay the same.
Sub Test()
Dim c As Range
For Each c In Range("A2:A" & Range("A" & Rows.Count).End(xlUp).Row)
c = removeData(c.text)
Next
End Sub
Function removeData(ByVal txt As String) As String
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(L1-somedata-\d{4}|\d{3})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
If allMatches.Count <> 0 Then
result = allMatches.Item(0).submatches.Item(0)
End If
ExtractSDI = result
End Function
I have put my code that I've got so far, all it does is go through each cell, if it matches it just removes the text that I want to keep as well as the stuff that I want removed!
I really hope all of that makes sence!
Any help will be much appreciated.
Chris

If the "-" are part of the input data, you could use a RegExp Replace like:
>> Set r1 = New RegExp
>> r1.Pattern = "^[^-]+(-[^-]+-).*"
>> WScript.Echo r1.Replace("L2-A-1234", "$1")
>>
-A-
or:
>> Set r1 = New RegExp
>> r1.Pattern = "^[^-]+-([^-]+).*"
>> WScript.Echo r1.Replace("L2-B-123", "$1")
>>
B
Instead of .Replace, you can use Submatches too:
>> WScript.Echo r1.Execute("Don't care-wanted-")(0).SubMatches(0)
>>
wanted
If you need a function, pass the Regexp into the the function; and remember the return value must be assigned to the function name (removeData <> ExtractSDI).
Another possibility for the second spec ("-" not part of desired output):
>> WScript.Echo Split("Whatever-Wanted-Ignore", "-")(1)
>>
Wanted
UPDATE:
To deal with "-" embedded in the desired output and to show how this approach can be used in/as a formula:
Option Explicit
' needs Ref to RegExp
Dim rX As RegExp
Function cleanSDI(s)
If rX Is Nothing Then
Set rX = New RegExp
rX.Pattern = "^([^-]*-)(.+)(-.*)$"
End If
cleanSDI = rX.Replace(s, "$2")
End Function
Depending on your data, you may have to change the .Pattern to
rX.Pattern = "^([^-]+-)(.+)(-.+)$"
to allow (*) / forbid (+) empty heads or tails. Use the Docs to work thru/understand the patterns.

You don't need VBA for this. If the data is in say Col A then put this formula in Cell B1 and copy it down.
=IF(AND(MID(A1,3,1)="-",MID(RIGHT(A1,5),1,1)="-"),MID(A1,4,LEN(A1)-8),IF(AND(MID(A1,3,1)="-",MID(RIGHT(A1,4),1,1)="-"),MID(A1,4,LEN(A1)-7),""))
Explanation:
4 is the length of L1- + 1 (from where we want to retrieve the string
8 is [3 + 5] which is the length of L1- and -0000
7 is [3 + 4] which is the length of L1- and -000

Related

Check string has a date in it and extract part of the string

I have thousands of lines of text that I need to work through and the lines I am interested with lines that look like the following:
01/04/2019 09:35:41 - Test user (Additional Comments)
I am currently using this code to filter out all the other rows:
If InStr(FullCell(i), " - ") <> 0 And InStr(FullCell(i), ":") <> 0 And InStr(FullCell(i), "(") <> 0 Then
FullCell is the array that I am working through.
which I know is not the best way to do it. Is there a way to check that there is a date at the beginning of the string in the format dd/mm/yyyy and then extract the user name inbetween the '-' and the '(' symbol.
I had a play with regex to see if that could help but i'm limited in skills to be able to pull off both VBA and regex in the same code.
Whats the best way to do this.
Assuming Fullcell(i) contains the string,
If Left(Fullcell(i), 10) Like "##/##/####"
Will return True if you have a date (note that it will not differentiate between dd/mm/yyyy and mm/dd/yyyy.
And
Mid(Fullcell(i), InStr(Fullcell(i), " - ") + 2, InStr(Fullcell(i), " (") - InStr(Fullcell(i), " - ") - 2)
Will return the username
I'm sure there is a more efficient way to do this, but I've used the following solution quite a few times:
This will select the date:
x = 1
Do While Mid(FullCell,1,x) <> " "
x = x + 1
Loop
strDate = Left(FullCell,x)
This will find the character number of the hyphen, the username starts 2 characters after.
x = 1
Do While Mid(FullCell,x,1) <> "-"
x = x + 1
Loop
Then we will find the end of the username
y = x + 2
Do While Mid(FullCell,y,1) <> " "
y = y + 1
Loop
The username should now be characters (x+2 to y-1)
strUsername = Mid(FullCell, x + 2, y - (x + 2) - 1)
Here's how I would do it
Dim your variables
Dim ring as Range
Dim dat as variant
Dim FullCell() as string
Dim User as string
Dim I as long
Set your range
Set rng = ` any way you choose
Dat = rng.value2
Loop dat
For i = 1 to UBound(dat, 1)
Split the data
FullCell = Trim(Split(FullCell, "-"))
Test if it split
If UBound(FullCell) > 0 Then
Test if it matches
If IsDate(FullCell(0)) Then
i = Instr(FullCell(1), "(")-1)
If i then
User = left$(FullCell(1), i)
' Found a user
End If
End If
End If
Next
Abstraction is your friend, it's always helpful to break these into their own private functions whenever you can. You could put your code in a function and call it something like ExtractUsername.
Below I did an example of this, and I decided to go with the RegExp approach (late binding), but you could use string functions like the examples above as well.
This function returns the username if it finds the pattern you mentioned above, otherwise, it returns an empty string.
Private Function ExtractUsername(ByVal SourceString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
'(FIRST GROUP FINDS THE DATE FORMATTED AS DD/MM/YYY, AS WELL AS THE FORWARD SLASH)
'(SECOND GROUP FINDS THE USERNAME) THIS WILL BE SUBMATCH 1
With RegEx
.Pattern = "(^\d{2}\/\d{2}\/\d{4}.*-)(.+)(\()"
.Global = True
End With
Dim Match As Object
Set Match = RegEx.Execute(SourceString)
'ONLY RETURN IF A MATCH WAS FOUND
If Match.Count > 0 Then
ExtractUsername = Trim(Match(0).SubMatches(1))
End If
Set RegEx = Nothing
End Function
The regex pattern is grouped into three parts, the date (and slash), username, and opening parentheses. What you are interested in is the username, which in the SubMatch would be number 1.
Regexr is a helpful site for practicing regular expressions and can show you a bit more of what the pattern I went with is doing.
Please note that using regular expressions might give you performance issues and you should test it against regular string functions to see what works best for your situation.

Getting numbers out of cells with text and producing a list with all these numbers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a column given to me in a spreadsheet which looks like that:
What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:
... and then get a full list of all references, as follows:
Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?
assuming:
worksheet to process named after "pressure"
column "A" with cells to get references out of
column "B" to write corresponding extracted references in
column "C"to write full list of all references in
you could try this
Option Explicit
Sub main()
Dim cell As Range
Dim references As String
Dim referencesArr As Variant
With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
references = references & GetReferences(cell) & "; "
Next cell
If references <> "" Then
referencesArr = Split(Left(references, Len(references) - 2), ";")
.Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
End If
End With
End Sub
Function GetReferences(rng As Range) As String
Dim arr As Variant, iElem As Long
Dim strng As String
With rng
arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
For iElem = 1 To UBound(arr) - 1 Step 2
strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
Next iElem
End With
If strng <> "" Then
GetReferences = Left(strng, Len(strng) - 2)
rng.Offset(, 1) = GetReferences
End If
End Function
There are many examples of regex number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).
Put the following into a standard module code sheet.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numberParse(str As String, _
Optional ndx As Integer = 0, _
Optional delim As String = "; ") As Variant
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
numberParse = vbNullString
With rgx
.Global = True
.MultiLine = True
.Pattern = "[0-9]{4}"
If .Test(str) Then
Set cmat = .Execute(str)
If CBool(ndx) Then
'pull the index of the array of matches
numberParse = cmat.Item(ndx - 1)
Else
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numberParse = Join(nums, delim)
End If
End If
End With
End Function
With your blurb in A2, put the following into B2,
=numberParse(A2)
With your blurb in A2, put the following into A4 and fill down,
=numberParse(A$2, ROW(1:1))
Your results should resemble the following,
¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.
² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.
For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.
Then, you can run this sub (with tweaking most likely, to narrow down the ranges):
Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
Cells(i + 1, colToUse).Value = myArray(i)
Next i
End Sub
Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

Split a column in a text file

I have a system which generates 3 text (.txt) files on a daily basis, with 1000's of entries within each.
Once the text files are generated we run a vbscript (below) that modifies the files by entering data at specific column positions.
I now need this vbscript to do an additional task which is to separate a column in one of the text files.
So for example the TR201501554s.txt file looks like this:
6876786786 GFS8978976 I
6786786767 DDF78676 I
4343245443 SBSSK67676 I
8393372263 SBSSK56565 I
6545434347 DDF7878333 I
6757650000 SBSSK453 I
With the additional task of seperating the column, data will now look like this, with the column seperated at a specific position.
6876786786 GFS 8978976 I
6786786767 DDF 78676 I
4343245443 SBSSK 67676 I
8393372263 SBSSK 56565 I
6545434347 DDF 7878333 I
6757650000 SBSSK 453 I
I was thinking maybe I could add another "case" to accomplish this with maybe using a "regex" pattern, since the pattern would be only 3 companies to find
(DDF, GFS and SBSSK).
But after looking at many examples, I am not really sure where to start.
Could someone let me know how to accomplish this additional task in our vbscript (below)?
Option Explicit
Const ForReading = 1
Const ForWriting = 2
Dim objFSO, pFolder, cFile, objWFSO, objFileInput, objFileOutput,strLine
Dim strInputPath, strOutputPath , sName, sExtension
Dim strSourceFileComplete, strTargetFileComplete, objSourceFile, objTargetFile
Dim iPos, rChar
Dim fileMatch
'folder paths
strInputPath = "C:\Scripts\Test"
strOutputPath = "C:\Scripts\Test"
'Create the filesystem object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get a reference to the processing folder
Set pFolder = objFSO.GetFolder(strInputPath)
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
ProcessAFile cFile
Next
Sub ProcessAFile(objFile)
fileMatch = false
Select Case Left(objFile.Name,2)
Case "MV"
iPos = 257
rChar = "YES"
fileMatch = true
Case "CA"
iPos = 45
rChar = "OCCUPIED"
fileMatch = true
Case "TR"
iPos = 162
rChar = "EUR"
fileMatch = true
End Select
If fileMatch = true Then
Set objWFSO = CreateObject("Scripting.FileSystemObject")
Set objFileInput = objWFSO.OpenTextFile(objFile.Path, ForReading)
strSourceFileComplete = objFile.Path
sExtension = objWFSO.GetExtensionName(objFile.Name)
sName = Replace(objFile.Name, "." & sExtension, "")
strTargetFileComplete = strOutputPath & "\" & sName & "_mod." & sExtension
Set objFileOutput = objFSO.OpenTextFile(strTargetFileComplete, ForWriting, True)
Do While Not objFileInput.AtEndOfStream
strLine = objFileInput.ReadLine
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
objFileInput.Close
objFileOutput.Close
Set objFileInput = Nothing
Set objFileOutput = Nothing
Set objSourceFile = objWFSO.GetFile(strSourceFileComplete)
objSourceFile.Delete
Set objSourceFile = Nothing
Set objTargetFile = objWFSO.GetFile(strTargetFileComplete)
objTargetFile.Move strSourceFileComplete
Set objTargetFile = Nothing
Set objWFSO = Nothing
End If
End Sub
You could add a regular expression replacement to your input processing loop. Since you want to re-format the columns I'd do it with a replacement function. Define both the regular expression and the function in the global scope:
...
Set pFolder = objFSO.GetFolder(strInputPath)
Set re = New RegExp
re.Pattern = " ([A-Z]+)(\d+)( +)"
Function ReFormatCol(m, g1, g2, g3, p, s)
ReFormatCol = Left(" " & Left(g1 & " ", 7) & g2 & g3, Len(m)+2)
End Function
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
...
and modify the input processing loop like this:
...
Do While Not objFileInput.AtEndOfStream
strLine = re.Replace(objFileInput.ReadLine, GetRef("ReFormatCol"))
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
...
Note that you may need to change your iPos values, since splitting and re-formatting the columns increases the length of the lines by 2 characters.
The callback function ReFormatCol has the following (required) parameters:
m: the match of the regular expression (used to determine the length of the match)
g1, g2, g3: the three groups from the expression
p: the starting position of the match in the source string (but not used here)
s: the source string (but not used here)
The function constructs the replacement for the match from the 3 groups like this:
Left(g1 & " ", 7) appends 4 spaces to the first group (e.g. GFS) and trims it to 7 characters. This is based on the assumption that the first group will always be 3-5 characters long.→ GFS
" " & ... & g2 & g3 prepends the result of the above operation with 2 spaces and appends the other 2 groups (8978976 & ).→ GFS 8978976
Left(..., Len(m)+2) then trims the result string to the length of the original match plus 2 characters (to account for the additional 2 spaces inserted to separate the new second column from the former second, now third, column).→ GFS 8978976
At first replace by regex pattern (\d+)\s+([A-Z]+)(\d+)\s+(\w+) replace with $1 $2 $3 $4
and split that by +. then ok.
Live demo

Use Regex to update VBA code

I have a VBA source code containing many hard coded references to cells. The code is part of the Worksheet_Change sub, so I guess hard coding the range references was necessary and you will see many assignment statements like the following:
Set cell = Range("B7")
If Not Application.Intersect(cell, Range(Target.Address)) Is Nothing Then
I would like insert 2 additional rows on top of the worksheet, so basically all the row references will shift by 2 rows. So for example the above assignment statement will be changed to Set cell = Range("B9").
Given the large number of hard coded row references in the code, I thought of using Regex to increment all the row references by 2. So I have developed the following code.
Sub UpdateVBACode()
'*********************Read Text File Containing VBA code and assign content to string variable*************************
Dim str As String
Dim strFile As String: strFile = "F:\Preprocessed_code.txt"
Open strFile For Input As #1
str = Input$(LOF(1), 1)
Close #1
'*********************Split string variables to lines******************************************************************
Dim vStr As Variant: vStr = Split(str, vbCrLf)
'*********************Regex work***************************************************************************************
Dim rex As New RegExp
rex.Global = True
Dim i As Long
Dim mtch As Object
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
For Each mtch In rex.Execute(vStr(i))
vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2))
Next
End If
Next i
'********************Reconstruct String*********************************************************************************
str = ""
For i = 0 To UBound(vStr, 1)
str = str & vbCrLf & vStr(i)
Next i
'********************Write string to text file******************************************************************************
Dim myFile As String
myFile = "F:\Processed_code.txt"
Open myFile For Output As #2
Print #2, str
Close #2
'
End Sub
Function IncrementString(rowNum As String) As String '
Dim num As Integer
num = CInt(rowNum) + 2
IncrementString = CStr(num)
End Function
The above VBA code works, except it fails if there are two row references in the same line, so for instance if we have If Range("B15").Value <> Range("B12").Value Then, after the line gets processed I get If Range("B14").Value <> Range("B14").Value Theninstead of If Range("B17").Value <> Range("B14").Value Then. The problem is in the vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)) statement, because it is getting called more than once if a line has more than Regex match.
Any ideas? Thanks in advance
I think what you are trying to do is a bad idea, for two reasons:
Hard-coded cell references are almost always poor practice. A better solution may be to replace hard-coded cell references with named ranges. You can refer to them in the code by name, and the associated references will update automatically if you insert/delete rows or columns. You have some painful upfront work to do but the result will be a much more maintainable spreadsheet.
You are effectively trying to write a VBA parser using regexes. This is pretty much guaranteed not to work in all cases. Your current regex will match lots of things that aren't cell references (e.g. "123", "_12", and "A00") and will also miss lots of hard-coded cell references (e.g. "A1" and Cell(3,7)). That may not matter for your particular code but the only way to be sure it's worked is to check each reference by hand. Which is IMHO not much less effort than refactoring (e.g. replace with named ranges). In my experience you don't fix a regex, you just make the problems more subtle.
That said, since you asked...
<cthulu>
There are only two choices when using RegExp.Replace() - either replace the first match or replace all matches (corresponding to setting RegExp.Global to False or True respectively). You don't have any finer control than that, so your logic has to change. Instead of using Replace() you could write your own code for the replacements, using the FirstIndex property of the Match object, and VBA's string functions to isolate the relevant parts of the string:
Dim rex As Object
Set rex = CreateObject("VBScript.RegExp")
rex.Global = True
Dim i As Long
Dim mtch As Object
Dim newLineText As String
Dim currMatchIndex As Long, prevPosition As Long
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
currMatchIndex = 0: prevPosition = 1
newLineText = ""
For Each mtch In rex.Execute(vStr(i))
'Note that VBA string functions are indexed from 1 but Match.FirstIndex starts from 0
currMatchIndex = mtch.FirstIndex
newLineText = newLineText & Mid(vStr(i), prevPosition, currMatchIndex - prevPosition + 1) & _
mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)
prevPosition = currMatchIndex + Len(mtch.Value) + 1
Next
vStr(i) = newLineText & Right(vStr(i), Len(vStr(i)) - prevPosition + 1)
End If
Next i
Note that I still haven't fixed the problems with the regex pattern in the first place. I recommend that you just go and use named ranges instead...
Oops, nearly forgot - </cth

Is this the RegEx for matching any cell reference in an Excel formula?

I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.
I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:
Formulas are preceded with an equal sign "="
Strings within formulas are enclosed in double quotes and need to be removed before looking for real references, otherwise =A1&"A1" would break regex
Worksheet names can be up to 31 characters long, excluding \ / ? * [ ] :
Worksheet names in external references must be succeeded with bang =Sheet1!A1
Workbook names in external references must be enclosed in square brackets =[Book1.xlsx]Sheet1!A1
Workbook paths, which Excel adds if a reference is to a range in a closed workbook, are always enclosed in single quotes and to the left of the brackets for the workbook name 'C:\[Book1.xlsx]Sheet1'!A1
Some characters (non-breaking space, for example) cause Excel to enclose the workbook and worksheet name in an external reference in single quotes, but I don't know specifically which characters ='[Book 1.xlsx]Sheet 1'!A1
Even if R1C1-notation is enabled, Range.Formula still returns references in A1-notation. Range.FormulaR1C1 returns references in R1C1 notation.
3D reference style allows a range of sheet names on one workbook =SUM([Book5]Sheet1:Sheet3!A1)
Named ranges can be specified in formulas:
The first character of a name must be a letter, an underscore character (_), or a backslash (\). Remaining characters in the name can be letters, numbers, periods, and underscore characters.
You cannot use the uppercase and lowercase characters "C", "c", "R", or "r" as a defined name, because they are all used as a shorthand for selecting a row or column for the currently selected cell when you enter them in a Name or Go To text box.
Names cannot be the same as a cell reference, such as Z$100 or R1C1.
Spaces are not allowed as part of a name.
A name can be up to 255 characters in length.
Names can contain uppercase and lowercase letters. Excel does not distinguish between uppercase and lowercase characters in names.
Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:
Sub ReturnFormulaReferences()
Dim objRegExp As New VBScript_RegExp_55.RegExp
Dim objCell As Range
Dim objStringMatches As Object
Dim objReferenceMatches As Object
Dim objMatch As Object
Dim intReferenceCount As Integer
Dim intIndex As Integer
Dim booIsReference As Boolean
Dim objName As Name
Dim booNameFound As Boolean
With objRegExp
.MultiLine = True
.Global = True
.IgnoreCase = True
End With
For Each objCell In Selection.Cells
If Left(objCell.Formula, 1) = "=" Then
objRegExp.Pattern = "\"".*\"""
Set objStringMatches = objRegExp.Execute(objCell.Formula)
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Set objReferenceMatches = objRegExp.Execute(objCell.Formula)
intReferenceCount = 0
For Each objMatch In objReferenceMatches
intReferenceCount = intReferenceCount + 1
Next
Debug.Print objCell.Formula
For intIndex = intReferenceCount - 1 To 0 Step -1
booIsReference = True
For Each objMatch In objStringMatches
If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
booIsReference = False
Exit For
End If
Next
If booIsReference Then
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7})"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
& "[a-z_\\][a-z0-9_\.]{0,254}$"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
booNameFound = False
For Each objName In objCell.Worksheet.Parent.Names
If objReferenceMatches(intIndex).Value = objName.Name Then
booNameFound = True
Exit For
End If
Next
If Not booNameFound Then
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
For Each objName In objCell.Worksheet.Names
If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
booNameFound = True
Exit For
End If
Next
End If
booIsReference = booNameFound
End If
End If
End If
If booIsReference Then
Debug.Print " " & objReferenceMatches(intIndex).Value _
& " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
& objReferenceMatches(intIndex).Length & ")"
End If
Next intIndex
Debug.Print
End If
Next
Set objRegExp = Nothing
Set objStringMatches = Nothing
Set objReferenceMatches = Nothing
Set objMatch = Nothing
Set objCell = Nothing
Set objName = Nothing
End Sub
Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.
Thanks!
jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.
stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula
Thanks jtolle!
I'm a few years late here, but I was looking for something similar and so dug into this. The main pattern you use is this:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Basically you have six alternatives for a range reference (lines 3-8), any of which will produce a match by itself, with two alternatives for an optional filename/sheet name prefix (lines 1-2).
For the two prefix alternatives, the only difference is that the first is wrapped in single quotes, with an extra dot star after the initial quote. These single quotes occur mainly when there is a space in a sheet name. The purpose of the dot star, matching unconstrained text after an initial single quote, is unclear and it appears to create problems. I'll discuss those problems below. Besides that the two alternative prefixes are the same, and I'll refer to them collectively as the Optional External Prefix (OEP).
The OEP has its own two optional prefixes (the same in either alternative). The first is for the workbook name, an open-ended dot star in brackets.
(\[.*\])?
The second is for a "3D" cell reference, with two sheet names separated by a colon; it is the initial sheet name including the colon. The pattern here is a negated character class allowing up to 31 characters of anything except forward slash, back slash, question mark, asterisk, brackets, or colon, followed by a colon:
([^\:\\\/\?\*\[\]]{1,31}\:)?
Finally for the OEP is its only required part: a sheet name, same as the optional sheet name but with no colon. The effect is (if these all worked correctly) that the required sheet name will match if it can, and then only if there is a 3d reference or additional prior bracketed text will its optional prefixes also match.
Issues with the Workbook/Sheet name prefix: First, the dot star at the beginning of the first line is over-inclusive. Similarly, the negated character class for the sheet name appears to need additional characters including parens, comma, plus, minus, equals, and bang. Otherwise, extra material is interpreted as part of the sheet name. On my testing, this overinclusion happened with any of these:
=SUM(Sheet1!A1,Sheet2!A2)
=Sheet1!A1+Sheet2!A2
=Sheet1!A1-Sheet2!A2
Sheet names can include some of these characters, so accounting for that would require some additional measure. For instance, a sheet could be named "(Sheet1)", giving an odd formula like:
=SUM('(Sheet1)'!A1:A2)
You'd like to get the inner parens with the sheet name there, but not the outer paren. Excel puts the single quotes on that one, as it would with a space in the sheet name. You could then exclude parens in the non-single quote version since within the single quote it's ok. But then beware Excel seems to even allow single quotes in sheet names. Taking these naming quirks to the extreme, I just successfully named a sheet "Hi'Sheet1'SUM('Sheet2'!A1,A2)!". That's absurd but it points to what could happen. I learned in doing this that if I include a single quote in a sheet name, formulas escape the single quote with a second single quote. So a SUM(A1:A2) referring to the sheet I just created ends up looking like this:
=SUM('Hi''Sheet1''SUM(''Sheet2''!A1,A2)!'!A1:A2)
That actually does give some insight into the Excel parser itself. I suspect to adequately deal with this you may want separately (outside the regex) to compare the potential sheet names or workbook names to the actual sheet names, as you have done with the named ranges.
This leads to the six forms of cell references allowed in the regex (any one of which, if met, will produce a match):
1.) A one-cell or multi-cell range with rows and columns
"(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?"
The open paren here is closed at the end of the 6 options. Otherwise, this line allows a basic cell reference of the type "$A$1", "A1", "$A1", "A$1", or any combination of these in a multi-cell range ("$A1:A$2", etc.).
2.) A full-column or multi-column range with absolute references only
"|\$[a-z]{1,3}\:\$[a-z]{1,3}"
This one allows a cell reference of the type "$A:$B" with a dollar sign on both. Note a dollar sign on only one side will not match.
3.) A full-column or multi-column range with relative references only
"|[a-z]{1,3}\:[a-z]{1,3}"
This line is like the last, but matches only with no dollar signs. Note a dollar sign on only one side will not match here either.
4.) A full-row or multi-row range with absolute references only
"|\$[0-9]{1,7}\:\$[0-9]{1,7}"
This line allows a cell reference of the type "$1:$2" with a dollar sign on both.
5.) A full-row or multi-row range with relative references only
"|[0-9]{1,7}\:[0-9]{1,7}"
This version is like the last, but matches only with no dollar signs.
6.) Other text that could be a named range
"|[a-z_\\][a-z0-9_\.]{0,254})"
Finally, the sixth option allows text. This text is compared to actual named ranges later in sub.
The main omission that I see here is ranges that have both absolute and relative references, of the type "A:$A" or "1:$1". While $A:A is captured because it includes "A:A", "A:$A" is not captured. You could address this and simplify the regex by combining 2 and 3 and combining 4 and 5 with optional dollar signs:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$?[a-z]{1,3}\:\$?[a-z]{1,3}" _
& "|\$?[0-9]{1,7}\:\$?[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Combining these further would seem to come up against the everything-is-optional problem.
One other issue is in the initial regex pattern for matching strings, which you use to expunge potential ranges that fall inside a quoted string:
objRegExp.Pattern = "\"".*\"""
When I test this on a formula with a string at the beginning and end of a formula, the greediness of the dot star captures everything from the initial quote to the final quote (in other words it interprets the entire formula as one big quoted string, even though there is non-string material in the middle). It appears you can fix this by making the dot star lazy (adding a question mark after it). That raises questions about quotes within quotes, but they may not be a problem. For instance, I tested this formula:
="John loves his A1 steak sauce, but said the ""good A1 steak sauce price"" is $" & A2+A3 & " less than the ""bad price"" of $" & A4 & "."
With cell values plugged in, this formula evaluates to:
John loves his A1 steak sauce, but said the "good A1 steak sauce
price" is $5 less than the "bad price" of $8.
With the lazy modifier added to your string pattern, both versions of "A1" above were recognized as occurring within a string and so were expunged, while A2, A3 and A4 were recognized as cell references.
I'm sure there are some technical issues with some of my language above, but hopefully the analysis is still useful.
Thanks Ben (I'm new to post here, even though Stackoverflow has caught my attention for years for high quality technical stuff, so I'm not sure if I read this page correctly for the author J)
I tried the posted solutions (testing, testing updated, as well as the one using range.precendents (which as correctly pointed, does not cover references to other sheets or other workbooks) and found a minor flaw: the external sheet name is enclosed in 'single quotation marks' only if it is a number; if it contains space (and possibly other characters as Ben (?) listed in the orginal post. with a simple addition to the regEx (opening [) this can be corrected (added "[", see code below). In addition, for my own purpose I converted the sub to a function that will return a comma-separated list with duplicates removed (note, this removes just identical reference notation, not cells that are included in multiple ranges):
Public Function CellReflist(Optional r As Range) ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2) ' INPUT THE CELL HERE , e.g. RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address
objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then CellReflist = result(0).Value
If result.Count > 1 Then
For i = 1 To result.Count - 1 'Each Match In result
dbl = False ' poistetaan tuplaesiintymiset
For j = 0 To i - 1
If result(i).Value = result(j).Value Then dbl = True
Next j
If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
Next i 'Match
End If
End If
End Function
I resolved a similar problem in Google Sheets.
The following adds/subtract row references from a formula. Because I just needed to update row references, rather than extracting the formula I just extracted and updated the row reference with this /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/
String.prototype.replaceAt = function(index, replacement, diff = 0) {
let end = this.substr(index + replacement.length + diff)
if((this.length - 1) === index) end = ""
return this.substr(0, index) + replacement + end;
}
// Ref: https://stackoverflow.com/a/1431113/2319414
/**
* #param row - positive integer to add, negative to subtract rows.
*/
function updateRowReference(formula, row){
let masked = formula
const mask = "#"
// masking double quotes in string literals
let exp = /""/g
let result;
while((result = exp.exec(masked)) !== null){
masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
}
// masking string literals
exp = /\"([^\\\"]|\\.)*\"/g
// Ref: https://stackoverflow.com/a/9260547
while((result = exp.exec(masked)) !== null){
masked = masked.replaceAt(result.index, new Array(result[0].length).fill(mask).join(""))
}
// updating row references
const sRow = row.toString()
// The magic is happening here
// Just matching a number which is part of range address
exp = /((?<=[A-Za-z\$:\!])\d+(?![A-Za-z\(!]))|(\d+(?=[:]))/g
while((result = exp.exec(masked)) !== null){
const oldRow = Number(result[0])
// adding/subtracting rows
const newRow = (row + oldRow).toString()
// preserving formula string length integrity if number of digits of new row is different than old row
const diff = result[0].length - newRow.length
masked = masked.replaceAt(result.index, newRow, diff)
formula = formula.replaceAt(result.index, newRow, diff)
exp.lastIndex -= diff
}
let updated = masked;
// revert mask
const array = formula.split("")
while((result = updated.search(mask)) !== -1){
updated = updated.replaceAt(result, array[result])
}
return updated
}
function test(){
const cases = [
"=$A$1",
"=A1",
"=$A1",
"=A$1",
"=$A1:B$1",
"=1:1",
"=Sheet1!1:1",
"=Sheet1!$A1:B$1",
"=Sheet1!A$1",
'=IF(AND($C6 <> ""; NOT(ISBLANK(B$6))); IF(SUM(FILTER($F$6:$F$7;$C$6:$C$7 = $C6)) < $G6; 1; IF($E6 = 0; 1; 0)); 0)',
"=$A$111", "=A111", "=$A111", "=A$111", "=$A111:B$111",
"=111:111",
"=Sheet1!111:111",
"=Sheet1!$A111:B$111",
"=Sheet1!A$111",
'=IF(AND($C111 <> ""; NOT(ISBLANK(B$111))); IF(SUM(FILTER($F$111:$F$112;$C$111:$C$112 = $C111)) < $G111; 1; IF($E111 = 0; 1; 0)); 0)',
// if string literals have addresses they shouldn't be affected
'=IF(AND($C111 <> "A1 $A1 $A1:B$1";$C111 <> "Sheet1!1:1";$C111 <> "Sheet1!$A1:B$1"); 1 , 0)'
]
const expectedAdd = [
'=$A$16',
'=A16',
'=$A16',
'=A$16',
'=$A16:B$16',
'=16:16',
'=Sheet1!16:16',
'=Sheet1!$A16:B$16',
'=Sheet1!A$16',
'=IF(AND($C21 <> ""; NOT(ISBLANK(B$21))); IF(SUM(FILTER($F$21:$F$22;$C$21:$C$22 = $C21)) < $G21; 1; IF($E21 = 0; 1; 0)); 0)',
'=$A$126',
'=A126',
'=$A126',
'=A$126',
'=$A126:B$126',
'=126:126',
'=Sheet1!126:126',
'=Sheet1!$A126:B$126',
'=Sheet1!A$126',
'=IF(AND($C126 <> ""; NOT(ISBLANK(B$126))); IF(SUM(FILTER($F$126:$F$127;$C$126:$C$127 = $C126)) < $G126; 1; IF($E126 = 0; 1; 0)); 0)',
'=IF(AND($C126 <> "A1 $A1 $A1:B$1";$C126 <> "Sheet1!1:1";$C126 <> "Sheet1!$A1:B$1"); 1 , 0)'
]
let results = cases.map(_case => updateRowReference(_case, 15))
console.log('Test Add')
console.log(results.every((result, i) => result === expectedAdd[i]))
console.log('Test Subtract')
results = results.map(_case => updateRowReference(_case, -15))
console.log(results.every((result, i) => result === cases[i]))
}
test()
'INDIRECT' function with addresses as strings will not be updated