Removing unwanted characters VBA (excel) - regex

I want to be able to copy raw data into column A, hit run on the macro and it should remove any unwanted characters both before and after the data that I want to keep resulting in a cell just containing the data that I want. I also want it to go through all cells that are in the column, bearing in mind some cells may be empty.
The data that I want to keep is in this format: somedata0000 or somedata000
Sometimes the cell will contain 'rubbish' both before and after the data that I want to keep i.e. rubbishsomedata0000 or somedata0000rubbish or rubbishsomedata0000rubbish.
And also, sometimes a single cell will contain:
rubbishsomedata0000rubbish
rubbishsomedata0000rubbish
rubbishsomedata0000rubbish
This will need to be changed to:
NEW CELL: somedata0000
NEW CELL: somedata0000
NEW CELL: somedata0000
The 'somedata' text will not change but the 0000 (which could be any 4 numbers) will sometimes be any 3 numbers.
Also there may be some rows in the column that have no useful data; these should be removed/deleted from the sheet.
Finally, some cells will contain the perfect somedata0000, these should stay the same.
Sub Test()
Dim c As Range
For Each c In Range("A2:A" & Range("A" & Rows.Count).End(xlUp).Row)
c = removeData(c.text)
Next
End Sub
Function removeData(ByVal txt As String) As String
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(somedata-\d{4}|\d{3})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
If allMatches.Count <> 0 Then
result = allMatches.Item(0).submatches.Item(0)
End If
ExtractSDI = result
End Function
I have put my code that I've got so far, all it does is go through each cell, if it matches it just removes the text that I want to keep as well as the stuff that I want removed! Why?

There are several issues in your code
As Gary said, you Function isn't returning a result
Your Regex.Pattern doesn't make sense
Your Sub doesn't attempt to handle multiple matches
Your Function doesn't even attempt to return multiple matches
Sub Test()
Dim rng As Range
Dim result As Variant
Dim i As Long
With ActiveSheet
Set rng = Range(.Cells(2, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
For i = rng.Rows.Count To 1 Step -1
result = removeData(rng.Cells(i, 1))
If IsArray(result) Then
If UBound(result) = 1 Then
rng.Cells(i, 1) = result(1)
Else
rng.Cells(i, 1).Offset(1, 0).Resize(UBound(result) - 1, 1).Insert xlShiftDown
rng.Cells(i, 1).Resize(UBound(result), 1) = Application.Transpose(result)
End If
Else
rng.Cells(i, 1).ClearContents
End If
Next
End Sub
Function removeData(ByVal txt As String) As Variant
Dim result As Variant
Dim allMatches As Object
Dim RE As Object
Dim i As Long
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(somedata\d{3,4})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(txt)
If allMatches.Count > 0 Then
ReDim result(1 To allMatches.Count)
For i = 0 To allMatches.Count - 1
result(i + 1) = allMatches.Item(i).Value
Next
End If
removeData = result
End Function

Related

Excel regex query returning empty data

I'm using the following VBA code from a related question in my Excel spreadsheet, and when I use it in a cell, it always fails (returns nothing). Even if I call it on a string literal in the function call (i.e. =RegexExtract("ABC1_DEF","ABC[0-9]")), it still fails. I've enabled the "Microsoft Visual Basic Regular Expressions 5.0" feature in the MSVBA application, so I'm not sure why these results are always empty. How can I resolve this?
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Edit
I tried yet another function from a separate question, and it just returns #VALUE!:
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
RegexExtract = allMatches.Item(0).submatches.Item(0)
End Function
Note you are trying to access .Submatches that stores capturing group values, but you have not defined any capturing groups in the pattern.
If you use (ABC[0-9]) you will get your match with the current function. Else, access the allMatches.Item(i) for full match values and discard the code to get the captured groups.

Excel regex match and return multiple references in formula

I've created a function that will return the Nth reference which includes a sheetname (if it's there), however it's not working for all instances. The regex string I'm using is
'[\w ]+[']!([$]{0,1})([A-Z]{1,2})([$]{0,1})(\d{1,5})
I'm finding though it won't find the first reference in either of the below examples:
='Biscuits Raw Data'!G783/'Biscuits Raw Data'!E783
=IF('Biscuits Raw Data'!G705="","",'Biscuits Raw Data'!G723/'Biscuits Raw Data'!G7005*100)
Below is my Function code:
Function GrabNthreference(Rng As range, NthRef As Integer) As String
Dim patrn As String
Dim RegX
Dim Matchs
Dim RegEx
Dim FinalMatch
Dim Subm
Dim i As Integer
Dim StrRef As String
patrn = "'[\w ]+[']!([$]{0,1})([A-Z]{1,2})([$]{0,1})(\d{1,5})"
StrRef = Rng.Formula
Set RegEx = CreateObject("vbscript.regexp") ' Create regular expression.
RegEx.Global = True
RegEx.Pattern = patrn ' Set pattern.
RegEx.IgnoreCase = True ' Make case insensitive.
Set RegX = RegEx.Execute(StrRef)
If RegX.Count < NthRef Then
GrabNthreference = StrRef
Exit Function
End If
i= -1
For Each Matchs In RegX ' Iterate Matches collection.
Set Subm = RegX(i).submatches
i = i + 1
If i = NthRef -1 Then
GrabNthreference = RegX(i)
Exit Function
End If
'Debug.Print RegX(i)
Next
End Function
Here's my final code
Function GrabNthreference(R As range, NthRef As Integer) As String 'based on http://stackoverflow.com/questions/13835466/find-all-used-references-in-excel-formula
Dim result As Object
Dim testExpression As String
Dim objRegEx As Object
Dim i As Integer
i = 0
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True
objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(R.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address think this is an old attempt so remming out
objRegEx.Pattern = "(['].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then
For Each Match In result
Debug.Print Match.Value
If i = NthRef - 1 Then
GrabNthreference = result(i)
Exit Function
End If
i = i + 1
Next Match
Else
GrabNthreference = "No precedencies found"
End If
End If
End Function
This code did lead me onto thinking about using the simple activecell.precedences method but I think the problem is that it won't report offsheet and won't indicate if the formula is relative or absolute.
Any comments welcome but I think I've answered my own question :)

Excel UDF for capturing numbers within characters

I have a variable text field sitting in cell A1 which contains the following:
Text;#Number;#Text;#Number
This format can keep repeating, but the pattern is always Text;#Number.
The numbers can vary from 1 digit to n digits (limit 7)
Example:
Original Value
MyName;#123;#YourName;#3456;#HisName;#78
Required value:
123, 3456, 78
The field is too variable for excel formulas from my understanding.
I tried using regexp but I am a beginner when it comes to coding. if you can break down the code with some explanation text, it would be much appreciated.
I have tried some of the suggestions below and they work perfectly. One more question.
Now that I can split the numbers from the text, is there any way to utilize the code below and add another layer, where we split the numbers into x cells.
For example: once we run the function, if we get 1234, 567 in the same cell, the function would put 1234 in cell B2, and 567 in cell C2. This would keep updating all cells in the same row until the string has exhausted all of the numbers that are retrieved from the function.
Thanks
This is the John Coleman's suggested method:
Public Function GetTheNumbers(st As String) As String
ary = Split(st, ";#")
GetTheNumbers = ""
For Each a In ary
If IsNumeric(a) Then
If GetTheNumbers = "" Then
GetTheNumbers = a
Else
GetTheNumbers = GetTheNumbers & ", " & a
End If
End If
Next a
End Function
If the pattern is fixed, and the location of the numbers never changes, you can assume the numbers will be located in the even places in the string. This means that in the array result of a split on the source string, you can use the odd indexes of the resulting array. For example in this string "Text;#Number;#Text;#Number" array indexes 1, 3 would be the numbers ("Text(0);#Number(1);#Text(2);#Number(3)"). I think this method is easier and safer to use if the pattern is indeed fixed, as it avoids the need to verify data types.
Public Function GetNums(src As String) As String
Dim arr
Dim i As Integer
Dim result As String
arr = Split(src, ";#") ' Split the string to an array.
result = ""
For i = 1 To UBound(arr) Step 2 ' Loop through the array, starting with the second item, and skipping one item (using Step 2).
result = result & arr(i) & ", "
Next
If Len(result) > 2 Then
GetNums = Left(result, Len(result) - 2) ' Remove the extra ", " at the end of the the result string.
Else
GetNums = ""
End If
End Function
The numbers can vary from 1 digit to n digits (limit 7)
None of the other responses seems to take the provided parameters into consideration so I kludged together a true regex solution.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numsOnly(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
numsOnly = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,7}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numsOnly = Join(nums, delim)
End If
End With
End Function
      
Regexp option that uses Replace
Sub Test()
Debug.Print StrOut("MyName;#123;#YourName;#3456;#HisName;#78")
End Sub
function
Option Explicit
Function StrOut(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(^|.+?)(\d{1,7})"
.Global = True
If .Test(strIn) Then
StrOut = .Replace(strIn, "$2, ")
StrOut = Left$(StrOut, Len(StrOut) - 2)
Else
StrOut = "Nothing"
End If
End With
End Function

VBA to edit excel column of data

I have a requirement to edit a column of data where each cell has to be edited to remove all non-numeric characters. The only data that I need are actual numbers and a decimal point if one was there originally. I found a piece of code that removes everything with the exception of a "%" character. If someone could look at the code below and let me know how to modify it I would be appreciative. Examples of the type of data I am editing are as follows Complete cell contents enclosed in quotes). "3" "2.5%" "17 nks" "3.00 %" "4 VNS"
Here's the code I have used;
Sub RemoveAlphas()
'' Remove alpha characters from a string.
Dim intI As Integer
Dim rngR As Range, rngRR As Range
Dim strNotNum As String, strTemp As String
Set rngRR = Selection.SpecialCells(xlCellTypeConstants, _
xlTextValues)
For Each rngR In rngRR
strTemp = ""
For intI = 1 To Len(rngR.Value)
If Mid(rngR.Value, intI, 1) Like "[0-9,.]" Then
strNotNum = Mid(rngR.Value, intI, 1)
Else: strNotNum = ""
End If
strTemp = strTemp & strNotNum
Next intI
rngR.Value = strTemp
Next rngR
End Sub
Thanks.
This can be done using Regex as shown below- Have tested with your exact sample data and works for me:
Sub RemoveAlphas()
'' Remove alpha characters from a string.
Dim intI As Integer
Dim rngR As Range, rngRR As Range
Dim strNotNum As String, strTemp As String
Dim RegEx As Object
Set rngRR = Selection.SpecialCells(xlCellTypeConstants, _
xlTextValues)
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^\d.]+"
For Each rngR In rngRR
rngR.Value = RegEx.Replace(rngR.Value, "")
Next rngR
End Sub
If you are getting the results you need with the exception of a percent sign, you can insert into this part of your code a Replace function:
Next intI
strTemp = Replace(strTemp, "%", "") 'Remove the % sign and replace with nothing.
rngR.Value = strTemp
Next rngR
Try the RexExp below which builds on my code from Remove non-numeric characters from a range of cells, and uses variant arrays for speed.
The RegExp pattern is [^\d\.]+
Sub KillNonNumbers()
Dim rng1 As Range
Dim rngArea As Range
Dim lngRow As Long
Dim lngCol As Long
Dim lngCalc As Long
Dim objReg As Object
Dim X()
On Error Resume Next
Set rng1 = Selection.SpecialCells(xlCellTypeConstants, xlTextValues)
If rng1 Is Nothing Then Exit Sub
On Error GoTo 0
'See Patrick Matthews excellent article on using Regular Expressions with VBA
Set objReg = CreateObject("vbscript.regexp")
objReg.Pattern = "[^\d\.]+"
objReg.Global = True
'Speed up the code by turning off screenupdating and setting calculation to manual
'Disable any code events that may occur when writing to cells
With Application
lngCalc = .Calculation
.ScreenUpdating = False
.Calculation = xlCalculationManual
.EnableEvents = False
End With
'Test each area in the user selected range
'Non contiguous range areas are common when using SpecialCells to define specific cell types to work on
For Each rngArea In rng1.Areas
'The most common outcome is used for the True outcome to optimise code speed
If rngArea.Cells.Count > 1 Then
'If there is more than once cell then set the variant array to the dimensions of the range area
'Using Value2 provides a useful speed improvement over Value. On my testing it was 2% on blank cells, up to 10% on non-blanks
X = rngArea.Value2
For lngRow = 1 To rngArea.Rows.Count
For lngCol = 1 To rngArea.Columns.Count
'replace the leading zeroes
X(lngRow, lngCol) = objReg.Replace(X(lngRow, lngCol), vbNullString)
Next lngCol
Next lngRow
'Dump the updated array sans leading zeroes back over the initial range
rngArea.Value2 = X
Else
'caters for a single cell range area. No variant array required
rngArea.Value = objReg.Replace(rngArea.Value, vbNullString)
End If
Next rngArea
'cleanup the Application settings
With Application
.ScreenUpdating = True
.Calculation = lngCalc
.EnableEvents = True
End With
Set objReg = Nothing
End Sub

Macro to skip Column A and strips out lower case letters

In column A, I have a list of sentences
In columns B-Z, I have strings contain numbers followed by letters both uppercase and lower case.
such as
45ABc
The following macro strips all lowercase letters in the entire work sheet - do not want it to strip any letters in column A. Please help.
Sub RegExReplace()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
Try this one:
Sub RegExReplace()
Dim objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
If objCell.Column<>1 Then objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
or if you know that values that should be replaced only in columns B:Z, you can use next code as well:
Sub RegExReplace()
Dim rng As Range, objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
With ActiveSheet
Set rng = Application.Intersect(.UsedRange, .Range("B:Z"))
End With
If Not rng Is Nothing Then
For Each objCell In rng
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End If
End Sub
I've added code that:
Fixes your pattern to remove what you want to remove directly - ie a-z - rather than what you want to preserve (currently A-Z-_ but could be much larger).
To use quicker arrays rather than range loops.
Sub objRegexReplace()
Dim rng1 As Range
Dim objRegex As Object
Dim X
Dim lngRow As Long
Dim lngCol As Long
Set rng1 = Application.Intersect(ActiveSheet.UsedRange, ActiveSheet.Range("B:Z"))
X = rng1.Value2
If rng1.Cells.Count > 1 Then
Set objRegex = CreateObject("VBScript.Regexp")
With objRegex
.Global = True
.Pattern = "[a-z]+"
.ignorecase = False
For lngRow = 1 To UBound(X, 1)
For lngCol = 1 To UBound(X, 2)
X(lngRow, lngCol) = .Replace(X(lngRow, lngCol), vbNullString)
Next
Next
rng1.Value2 = X
End With
Else
MsgBox "No range to work on", vbCritical
End If
End Sub