Separating strings from numbers with Excel VBA - regex

I need to
a) separate strings from numbers for a selection of cells
and
b) place the separated strings and numbers into different columns.
For example , Excel sheet is as follows:
A1 B1
100CASH etc.etc.
The result should be:
A1 B1 C1
100 CASH etc.etc.
Utilization of regular expressions will be useful, as there may be different cell formats,such as 100-CASH, 100/CASH, 100%CASH. Once the procedure is set up it won't be hard to use regular expressions for different variations.
I came across a UDF for extracting numbers from a cell. This can easily be modified to extract string or other types of data from cells simply changing the regular expression.
But what I need is not just a UDF but a sub procedure to split cells using regular expressions and place the separated data into separate columns.
I've also found a similar question in SU, however it isn't VBA.

See if this will work for you:
UPDATED 11/30:
Sub test()
Dim RegEx As Object
Dim strTest As String
Dim ThisCell As Range
Dim Matches As Object
Dim strNumber As String
Dim strText As String
Dim i As Integer
Dim CurrCol As Integer
Set RegEx = CreateObject("VBScript.RegExp")
' may need to be tweaked
RegEx.Pattern = "-?\d+"
' Get the current column
CurrCol = ActiveCell.Column
Dim lngLastRow As Long
lngLastRow = Cells(1, CurrCol).End(xlDown).Row
' add a new column & shift column 2 to the right
Columns(CurrCol + 1).Insert Shift:=xlToRight
For i = 1 To lngLastRow ' change to number of rows to search
Set ThisCell = ActiveSheet.Cells(i, CurrCol)
strTest = ThisCell.Value
If RegEx.test(strTest) Then
Set Matches = RegEx.Execute(strTest)
strNumber = CStr(Matches(0))
strText = Mid(strTest, Len(strNumber) + 1)
' replace original cell with number only portion
ThisCell.Value = strNumber
' replace cell to the right with string portion
ThisCell.Offset(0, 1).Value = strText
End If
Next
Set RegEx = Nothing
End Sub

How about:
Sub UpdateCells()
Dim rng As Range
Dim c As Range
Dim l As Long
Dim s As String, a As String, b As String
''Working with sheet1 and column C
With Sheet1
l = .Range("C" & .Rows.Count).End(xlUp).Row
Set rng = .Range("C1:C" & l)
End With
''Working with selected range from above
For Each c In rng.Cells
If c <> vbNullString Then
s = FirstNonNumeric(c.Value)
''Split the string into numeric and non-numeric, based
''on the position of first non-numeric, obtained above.
a = Mid(c.Value, 1, InStr(c.Value, s) - 1)
b = Mid(c.Value, InStr(c.Value, s))
''Put the two values on the sheet in positions one and two
''columns further along than the test column. The offset
''can be any suitable value.
c.Offset(0, 1) = a
c.Offset(0, 2) = b
End If
Next
End Sub
Function FirstNonNumeric(txt As String) As String
With CreateObject("VBScript.RegExp")
.Pattern = "[^0-9]"
FirstNonNumeric = .Execute(txt)(0)
End With
End Function

Related

Excel VBA - Looking up a string with wildcards

Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub
Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).

Getting numbers out of cells with text and producing a list with all these numbers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a column given to me in a spreadsheet which looks like that:
What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:
... and then get a full list of all references, as follows:
Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?
assuming:
worksheet to process named after "pressure"
column "A" with cells to get references out of
column "B" to write corresponding extracted references in
column "C"to write full list of all references in
you could try this
Option Explicit
Sub main()
Dim cell As Range
Dim references As String
Dim referencesArr As Variant
With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
references = references & GetReferences(cell) & "; "
Next cell
If references <> "" Then
referencesArr = Split(Left(references, Len(references) - 2), ";")
.Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
End If
End With
End Sub
Function GetReferences(rng As Range) As String
Dim arr As Variant, iElem As Long
Dim strng As String
With rng
arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
For iElem = 1 To UBound(arr) - 1 Step 2
strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
Next iElem
End With
If strng <> "" Then
GetReferences = Left(strng, Len(strng) - 2)
rng.Offset(, 1) = GetReferences
End If
End Function
There are many examples of regex number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).
Put the following into a standard module code sheet.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numberParse(str As String, _
Optional ndx As Integer = 0, _
Optional delim As String = "; ") As Variant
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
numberParse = vbNullString
With rgx
.Global = True
.MultiLine = True
.Pattern = "[0-9]{4}"
If .Test(str) Then
Set cmat = .Execute(str)
If CBool(ndx) Then
'pull the index of the array of matches
numberParse = cmat.Item(ndx - 1)
Else
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numberParse = Join(nums, delim)
End If
End If
End With
End Function
With your blurb in A2, put the following into B2,
=numberParse(A2)
With your blurb in A2, put the following into A4 and fill down,
=numberParse(A$2, ROW(1:1))
Your results should resemble the following,
¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.
² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.
For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.
Then, you can run this sub (with tweaking most likely, to narrow down the ranges):
Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
Cells(i + 1, colToUse).Value = myArray(i)
Next i
End Sub
Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

Excel UDF for capturing numbers within characters

I have a variable text field sitting in cell A1 which contains the following:
Text;#Number;#Text;#Number
This format can keep repeating, but the pattern is always Text;#Number.
The numbers can vary from 1 digit to n digits (limit 7)
Example:
Original Value
MyName;#123;#YourName;#3456;#HisName;#78
Required value:
123, 3456, 78
The field is too variable for excel formulas from my understanding.
I tried using regexp but I am a beginner when it comes to coding. if you can break down the code with some explanation text, it would be much appreciated.
I have tried some of the suggestions below and they work perfectly. One more question.
Now that I can split the numbers from the text, is there any way to utilize the code below and add another layer, where we split the numbers into x cells.
For example: once we run the function, if we get 1234, 567 in the same cell, the function would put 1234 in cell B2, and 567 in cell C2. This would keep updating all cells in the same row until the string has exhausted all of the numbers that are retrieved from the function.
Thanks
This is the John Coleman's suggested method:
Public Function GetTheNumbers(st As String) As String
ary = Split(st, ";#")
GetTheNumbers = ""
For Each a In ary
If IsNumeric(a) Then
If GetTheNumbers = "" Then
GetTheNumbers = a
Else
GetTheNumbers = GetTheNumbers & ", " & a
End If
End If
Next a
End Function
If the pattern is fixed, and the location of the numbers never changes, you can assume the numbers will be located in the even places in the string. This means that in the array result of a split on the source string, you can use the odd indexes of the resulting array. For example in this string "Text;#Number;#Text;#Number" array indexes 1, 3 would be the numbers ("Text(0);#Number(1);#Text(2);#Number(3)"). I think this method is easier and safer to use if the pattern is indeed fixed, as it avoids the need to verify data types.
Public Function GetNums(src As String) As String
Dim arr
Dim i As Integer
Dim result As String
arr = Split(src, ";#") ' Split the string to an array.
result = ""
For i = 1 To UBound(arr) Step 2 ' Loop through the array, starting with the second item, and skipping one item (using Step 2).
result = result & arr(i) & ", "
Next
If Len(result) > 2 Then
GetNums = Left(result, Len(result) - 2) ' Remove the extra ", " at the end of the the result string.
Else
GetNums = ""
End If
End Function
The numbers can vary from 1 digit to n digits (limit 7)
None of the other responses seems to take the provided parameters into consideration so I kludged together a true regex solution.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numsOnly(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
numsOnly = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,7}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numsOnly = Join(nums, delim)
End If
End With
End Function
      
Regexp option that uses Replace
Sub Test()
Debug.Print StrOut("MyName;#123;#YourName;#3456;#HisName;#78")
End Sub
function
Option Explicit
Function StrOut(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(^|.+?)(\d{1,7})"
.Global = True
If .Test(strIn) Then
StrOut = .Replace(strIn, "$2, ")
StrOut = Left$(StrOut, Len(StrOut) - 2)
Else
StrOut = "Nothing"
End If
End With
End Function

Extract four numbers without brackets from a bracketed entry, if entry exists

What I have:
A list of about 1000 titles of reports in column B.
Some of these titles have a four digit number surrounded by brackets (eg: (3672)) somewhere in a string of text and numbers.
I want to extract these four numbers - without brackets - in column C in the same row.
If there is no four digit number with brackets in column B, then to return "" in column C.
What I have so far:
I can successfully identify the cells in column B which have four digits surrounded by brackets. The problem is it returns the whole title including the four numbers.
Taken from: VBA RegEx extracting data from within a string
NB: I am Using Excel Professional Plus 2010, have checked the box next to "Microsoft VBScript Regular Expressions 5.5".
Sub ExtractTicker()
Dim regEx
Dim i As Long
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = True
regEx.pattern = "(\()([0-9]{4})(\))"
For i = 2 To ActiveSheet.UsedRange.Rows.Count
If (regEx.Test(Cells(i, 2).Value)) Then
Cells(i, 3).Value = regEx.Replace(Cells(i, 2).Value, "$2")
End If
Next i
End Sub
Try
regEx.pattern = "(.*\()([0-9]{4})(\).*)"
the .* and the start and end of the string ensure you capture the entire string, then this is fully substituted by the 2nd submatch ([0-9]{4})
To fully optimise the code
use variant arrays rather than ranges
setting Global and IgnoreCase is redundant when you are running a case insensitive match on the full string
you are using late binding so you dont need the Reference
code
Sub ExtractTicker()
Dim regEx As Object
Dim pattern As String
Dim X
Dim lngCNt As Long
X = Range([b1], Cells(Rows.Count, "B").End(xlUp)).Value2
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.pattern = "(.*\()([0-9]{4})(\).*)"
For lngCNt = 1 To UBound(X)
If .Test(X(lngCNt, 1)) Then
X(lngCNt, 1) = .Replace(X(lngCNt, 1), "$2")
Else
X(lngCNt, 1) = vbNullString
End If
Next
End With
[c1].Resize(UBound(X, 1), 1).Value2 = X
End Sub

Removing unwanted characters VBA (excel)

I want to be able to copy raw data into column A, hit run on the macro and it should remove any unwanted characters both before and after the data that I want to keep resulting in a cell just containing the data that I want. I also want it to go through all cells that are in the column, bearing in mind some cells may be empty.
The data that I want to keep is in this format: somedata0000 or somedata000
Sometimes the cell will contain 'rubbish' both before and after the data that I want to keep i.e. rubbishsomedata0000 or somedata0000rubbish or rubbishsomedata0000rubbish.
And also, sometimes a single cell will contain:
rubbishsomedata0000rubbish
rubbishsomedata0000rubbish
rubbishsomedata0000rubbish
This will need to be changed to:
NEW CELL: somedata0000
NEW CELL: somedata0000
NEW CELL: somedata0000
The 'somedata' text will not change but the 0000 (which could be any 4 numbers) will sometimes be any 3 numbers.
Also there may be some rows in the column that have no useful data; these should be removed/deleted from the sheet.
Finally, some cells will contain the perfect somedata0000, these should stay the same.
Sub Test()
Dim c As Range
For Each c In Range("A2:A" & Range("A" & Rows.Count).End(xlUp).Row)
c = removeData(c.text)
Next
End Sub
Function removeData(ByVal txt As String) As String
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(somedata-\d{4}|\d{3})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
If allMatches.Count <> 0 Then
result = allMatches.Item(0).submatches.Item(0)
End If
ExtractSDI = result
End Function
I have put my code that I've got so far, all it does is go through each cell, if it matches it just removes the text that I want to keep as well as the stuff that I want removed! Why?
There are several issues in your code
As Gary said, you Function isn't returning a result
Your Regex.Pattern doesn't make sense
Your Sub doesn't attempt to handle multiple matches
Your Function doesn't even attempt to return multiple matches
Sub Test()
Dim rng As Range
Dim result As Variant
Dim i As Long
With ActiveSheet
Set rng = Range(.Cells(2, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
For i = rng.Rows.Count To 1 Step -1
result = removeData(rng.Cells(i, 1))
If IsArray(result) Then
If UBound(result) = 1 Then
rng.Cells(i, 1) = result(1)
Else
rng.Cells(i, 1).Offset(1, 0).Resize(UBound(result) - 1, 1).Insert xlShiftDown
rng.Cells(i, 1).Resize(UBound(result), 1) = Application.Transpose(result)
End If
Else
rng.Cells(i, 1).ClearContents
End If
Next
End Sub
Function removeData(ByVal txt As String) As Variant
Dim result As Variant
Dim allMatches As Object
Dim RE As Object
Dim i As Long
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = "(somedata\d{3,4})"
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(txt)
If allMatches.Count > 0 Then
ReDim result(1 To allMatches.Count)
For i = 0 To allMatches.Count - 1
result(i + 1) = allMatches.Item(i).Value
Next
End If
removeData = result
End Function