parse a string with VBA on EXCEL - regex

I'm trying to analyse a text file imported in an excel tab.
The first column is a text code used to represent the timeline of operations and I would tike to convert this text code into a number of seconds.
Text codes can be formated like this :
5,9h180s -> 21420s
10min -> 600s
3,4h5min30s -> 12570s
...
My idea is to extract the values with a regex a use them later in a function, but I don't know VBA very good.
Is there a function in EXCEL VBA to perform regular expressions on a string and extract data from it ?
Do you have an example of a such function ?

After a few attempts at the various combinations (and some of my own imagination) I decided that truncating the time value string at the first letter of the unit would allow me to use that h / m / s as the last character in the regex .Pattern property. This pre-regex prepping gave me the best results.
In a standard module code sheet as,
Function howManySeconds(strTM As String) As Long
Dim s As Long, tmp As String
Dim rgx As Object, cmat As Object
Dim x As Long, vPTTRNs As Variant
Set rgx = CreateObject("VBScript.RegExp")
vPTTRNs = Array("[0-9,\.,\s]{1,9}×$", _
"h", 3600, "m", 60, "s", 1)
With rgx
.Global = True
.IgnoreCase = True
For x = LBound(vPTTRNs) + 1 To UBound(vPTTRNs) Step 2
If CBool(InStr(1, LCase(strTM), vPTTRNs(x), vbTextCompare)) Then
tmp = Replace(Replace(Replace(LCase(strTM), _
"seconds", "s"), "secs", "s"), _
Chr(44), Chr(46))
tmp = Replace(Replace(Left(tmp, InStrRev(strTM, vPTTRNs(x), -1, vbTextCompare)), _
Chr(44), Chr(46)), Chr(32), vbNullString)
.Pattern = Replace(vPTTRNs(LBound(vPTTRNs)), Chr(215), vPTTRNs(x))
Set cmat = .Execute(tmp)
If CBool(cmat.Count) Then
s = s + CLng(CDbl(Replace(cmat.Item(0), vPTTRNs(x), vbNullString)) * vPTTRNs(x + 1))
End If
End If
Next x
End With
howManySeconds = s
Set rgx = Nothing
End Function
Use like any native worksheet function. In C2 as,
=howManySeconds(A2)
        
You should note that (by the right-alignment) those values are true numbers which can be totalled or otherwise mathematically manipulated. A custom number format mask of 0\s_) has been applied to grant them a displayed s as a unit.
See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for an excellent local reference on using Regular Expressions in VBA.

Related

Can I use regex to not only identify a pattern but extract the value found?

I am using Excel VBA.
I need to extract the dimensions (width x height) of a creative from a string and the dimensions will always be in the format:
000x000 or 000X000 or 000x00 or 000X00 where 0 can be any number between 1-9 and x can be upper or lower case.
I read this guide:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
And I think what I want is something similar to:
[0-9]{2, 3}[xX][0-9]{2, 3}
So if my string is:
creativeStr = ab234-cdc-234-300x250-777aabb
I want to extract "300x250" and assign it to a variable like this:
dimensions = 300x250
Is my Regex above correct? Also, how would I pull the resulting match into a variable?
Here is part of my code:
creativeStr = "Sample-abc-300x250-cba-123"
regex_pattern = "[0-9]{2,3}[xX][0-9]{2,4}"
If regex_pattern <> "" Then
With regEx
.Global = True
.Pattern = regex_pattern
End With
If regEx.Test(creativeStr) Then
dimensions = regEx.Replace(creativeStr, "$1")
Else
dimensions = "Couldn't extract dimensions from creative name."
End If
End If
But it still returns the condition in my else clause...
Thanks!
Your examples do not match your regex. Your examples show that the first set of digits will always be three, and the last set either two or three.
Also, in your description you write can be any number between 1-9 but your example includes 0's.
If you are going to work with regex, that type of imprecision will lead to undesired results.
Asssuming that 0's should be included, and that the desired pattern is 3x2 or 3x3, then perhaps this example will provide some clarity:
Option Explicit
Function dimension(S As String) As String
Dim RE As Object, MC As Object
Const sPat As String = "[0-9]{3}[Xx][0-9]{2,3}"
' or, with .ignorecase = true, could use: "\d{3}x\d{2,3}"
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = sPat
If .Test(S) = True Then
Set MC = .Execute(S)
dimension = MC(0)
Else
dimension = "Couldn't extract dimensions from creative name."
End If
End With
End Function
Sub getDimension()
Dim creativeStr As String
Dim Dimensions As String
creativeStr = "Sample-abc-300x250-cba-123"
Dimensions = dimension(creativeStr)
Debug.Print Dimensions
End Sub

Getting numbers out of cells with text and producing a list with all these numbers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a column given to me in a spreadsheet which looks like that:
What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:
... and then get a full list of all references, as follows:
Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?
assuming:
worksheet to process named after "pressure"
column "A" with cells to get references out of
column "B" to write corresponding extracted references in
column "C"to write full list of all references in
you could try this
Option Explicit
Sub main()
Dim cell As Range
Dim references As String
Dim referencesArr As Variant
With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
references = references & GetReferences(cell) & "; "
Next cell
If references <> "" Then
referencesArr = Split(Left(references, Len(references) - 2), ";")
.Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
End If
End With
End Sub
Function GetReferences(rng As Range) As String
Dim arr As Variant, iElem As Long
Dim strng As String
With rng
arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
For iElem = 1 To UBound(arr) - 1 Step 2
strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
Next iElem
End With
If strng <> "" Then
GetReferences = Left(strng, Len(strng) - 2)
rng.Offset(, 1) = GetReferences
End If
End Function
There are many examples of regex number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).
Put the following into a standard module code sheet.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numberParse(str As String, _
Optional ndx As Integer = 0, _
Optional delim As String = "; ") As Variant
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
numberParse = vbNullString
With rgx
.Global = True
.MultiLine = True
.Pattern = "[0-9]{4}"
If .Test(str) Then
Set cmat = .Execute(str)
If CBool(ndx) Then
'pull the index of the array of matches
numberParse = cmat.Item(ndx - 1)
Else
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numberParse = Join(nums, delim)
End If
End If
End With
End Function
With your blurb in A2, put the following into B2,
=numberParse(A2)
With your blurb in A2, put the following into A4 and fill down,
=numberParse(A$2, ROW(1:1))
Your results should resemble the following,
¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.
² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.
For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.
Then, you can run this sub (with tweaking most likely, to narrow down the ranges):
Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
Cells(i + 1, colToUse).Value = myArray(i)
Next i
End Sub
Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

Using VBA to parse and split a string with wildcards?

I've got a sheet that contains item numbers of alphanumeric characters, and a bunch of other information in the row. Sometimes, similar items are combined into one row, and the difference on the item number will be shown with (X/Y) to choose which character to use at that point in the item number (not just X or Y, can be any alphanumeric character). In other words, these entries will look like this:
AB(X/Y)CD123
What I need is a way to separate that into the two item numbers ABXCD123 and ABYCD123. After that I'll have to create a row below the current one and copy the current row into it, with the changed item number, but that part is easy. I've tried using InStr to get the (X/Y) flagged, but I don't know how to pull out the X and Y characters to make new strings with them. I also don't know if a wildcard will work with InStr, and I'm not too familiar with RegEx.
Any ideas?
Here is s little introduction to regex¹ in a UDF².
Function partNums(str As String, _
Optional num As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
partNums = vbNullString
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "\([A-Z]{1}/[A-Z]{1}\)"
If .Test(str) Then
tmp = .Execute(str)(0)
Select Case num
Case 2
tmp = Mid(tmp, 4, 1)
Case Else
tmp = Mid(tmp, 2, 1)
End Select
partNums = .Replace(str, tmp)
End If
End With
End Function
In B2:B3 as,
=partNums(A2)
=partNums(A3,2)
            
Here is a largely duplicated UDF that handles from 1 to 3 characters.
Function partNums(str As String, _
Optional num As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
partNums = vbNullString
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = False
.Pattern = "\([A-Z]{1,3}/[A-Z]{1,3}\)"
If .Test(str) Then
tmp = .Execute(str)(0)
tmp = Split(Replace(Replace(tmp, Chr(40), vbNullString), Chr(41), vbNullString), Chr(47))(num - 1)
partNums = .Replace(str, tmp)
End If
End With
End Function
            
¹ regex questions can usually be answered by the solutions in How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops.
² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

Extract four numbers without brackets from a bracketed entry, if entry exists

What I have:
A list of about 1000 titles of reports in column B.
Some of these titles have a four digit number surrounded by brackets (eg: (3672)) somewhere in a string of text and numbers.
I want to extract these four numbers - without brackets - in column C in the same row.
If there is no four digit number with brackets in column B, then to return "" in column C.
What I have so far:
I can successfully identify the cells in column B which have four digits surrounded by brackets. The problem is it returns the whole title including the four numbers.
Taken from: VBA RegEx extracting data from within a string
NB: I am Using Excel Professional Plus 2010, have checked the box next to "Microsoft VBScript Regular Expressions 5.5".
Sub ExtractTicker()
Dim regEx
Dim i As Long
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = True
regEx.pattern = "(\()([0-9]{4})(\))"
For i = 2 To ActiveSheet.UsedRange.Rows.Count
If (regEx.Test(Cells(i, 2).Value)) Then
Cells(i, 3).Value = regEx.Replace(Cells(i, 2).Value, "$2")
End If
Next i
End Sub
Try
regEx.pattern = "(.*\()([0-9]{4})(\).*)"
the .* and the start and end of the string ensure you capture the entire string, then this is fully substituted by the 2nd submatch ([0-9]{4})
To fully optimise the code
use variant arrays rather than ranges
setting Global and IgnoreCase is redundant when you are running a case insensitive match on the full string
you are using late binding so you dont need the Reference
code
Sub ExtractTicker()
Dim regEx As Object
Dim pattern As String
Dim X
Dim lngCNt As Long
X = Range([b1], Cells(Rows.Count, "B").End(xlUp)).Value2
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.pattern = "(.*\()([0-9]{4})(\).*)"
For lngCNt = 1 To UBound(X)
If .Test(X(lngCNt, 1)) Then
X(lngCNt, 1) = .Replace(X(lngCNt, 1), "$2")
Else
X(lngCNt, 1) = vbNullString
End If
Next
End With
[c1].Resize(UBound(X, 1), 1).Value2 = X
End Sub

Excel VBA: search a string to find the first non-text character

Cells contain a mixture of characters within a string, such as:
Abcdef_8765
QWERTY3_JJHH
Xyz9mnop
I need to find the first non A-Za-z character so that I can strip out the subsequent remainder of the string.
So the results would be:
Abcdef
QWERTY
Xyz
I know how to do this if I know exactly what character I'm looking for, but I'm not intuitively grasping how to find ANY character other than A-Za-z.
Btw, this is intended to be used within a vba solution.
====================
EDIT:
I've had success with the following...
a = "abc123"
b = Len(a)
For x = 1 To b
c = (Mid(a, x, 1) Like "[a-zA-Z]")
If c = False Then
d = Left(a, x - 1)
Exit Sub
End If
Next x
Have I stumbled upon a suitable solution, or is this destined to break?
I ask only because I look at Doug Glancy's solution and it seems much more substantial.
(btw, I have not yet tested Doug's solution)
Here is a simple way which doesn't use RegEx. I am deliberately not using RegEx as the other two answer are based on RegEx. RegEx is definitely faster but this is almost equally fast. The difference in speed is almost negligible.
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
For i = 1 To Len(Rng.Value)
Select Case Asc(Mid(Rng.Value, i, 1))
Case 65 To 90, 97 To 122
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
Usage:
=GetWord(A1)
EDIT:
Followup from comments. Fine tuned the code (Courtesy #brettdj) .
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
Dim sString As String
sString = UCase$(Rng.Value)
For i = 1 To Len(sString)
Select Case Asc(Mid$(sString, i, 1))
Case 65 To 90
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
More Followup.
Here is something which I had never tried before. I did an actual test of my code vs RegXp and I was surprised to see my code was faster than RegXp which I had not anticipated.
I tested it on 10k cells and each cell had a string of 2256 of length
The string that I put in Cell A1:A10000 is
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5Rout
Next I ran this test
The regexp below looks to remove from the first non A-Z character.
Function StrChange(strIn As String) As String
Dim objRegEx As Object
Set objRegEx = CreateObject("vbscript.regexp")
With objRegEx
.ignorecase = True
.Pattern = "^([a-z]+)([^a-z].*)"
.Global = True
StrChange = .Replace(strIn, "$1")
End With
End Function
You can use a simple regular expression to specify a numeral followed by anything and use this function to replace anything that matches that pattern:
Function Regex_Replace(strOriginal As String, strPattern As String, strReplacement, varIgnoreCase As Boolean) As String
Dim objRegExp As Object
Set objRegExp = CreateObject("vbscript.regexp")
With objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
End With
Regex_Replace = objRegExp.Replace(strOriginal, strReplacement)
Set objRegExp = Nothing
End Function
You'd call it like this:
Sub DeleteAfterNums()
Dim cell As Excel.Range
'Change "Selection" to your range
For Each cell In Selection
'"\d.+" is a numeral and whatever follows it
cell.Value = Regex_Replace(cell.Value, "\d.+", "", True)
Next cell
End Sub
Here is a lightweight and fast method that avoids regex/reference additions, thus helping with overhead and transportability should that be an advantage.
Public Function GetText(xValue As String) As Variant
For GetText = 1 To Len(xValue)
If UCase(Mid(xValue, GetText, 1)) Like "[!A-Z]" Then GetText = Left(xValue, GetText - 1): Exit Function
Next
GetText = xValue
End Function
This is then called by using GetText("Submission String") from vba or prepended with a "=" from within a cell formula.